Question 1

How much does an AI voice agent cost in 2026?

Accepted Answer

A production AI voice agent in 2026 typically costs $0.08 to $0.30 per minute all-in once telephony, speech-to-text, the LLM, and text-to-speech are stacked. Build costs for a custom voice agent range from $8,000 for a single-intent agent on a managed platform to $80,000+ for a multi-skill agent with custom backend integrations. Most startups land in the $15,000 to $40,000 range for a first production agent.

Question 2

What drives the per-minute cost of a voice agent?

Accepted Answer

Four stacked costs: telephony carries the call (Twilio is roughly $0.013 per minute inbound, $0.022 per minute outbound in the US). Speech-to-text transcribes the caller (Deepgram, AssemblyAI, or OpenAI Whisper run $0.0040 to $0.0080 per minute). The LLM generates responses (token costs vary widely - Claude Haiku and GPT-4o mini are pennies per call, Claude Opus or GPT-5 can be $0.10+ per minute on heavy reasoning). Text-to-speech reads the response (ElevenLabs Flash is around $0.06 per 1,000 characters, OpenAI TTS is similar). For a typical 3-minute support call, all-in cost is usually $0.40 to $1.20.

Question 3

What is the difference between Vapi, Bland, Retell, and rolling your own?

Accepted Answer

Vapi, Bland, and Retell are voice-agent orchestrators that handle the real-time pipeline of telephony plus STT plus LLM plus TTS for you. They charge a platform markup (typically $0.05 to $0.10 per minute on top of underlying costs), in exchange for handling latency, interruption detection, and barge-in. Rolling your own with LiveKit, Twilio Media Streams, and a custom orchestrator costs less per minute at scale but requires real engineering investment in the voice pipeline. At under 50,000 minutes per month, platforms usually win on total cost when you factor in engineering time. Above that, custom builds start to pay back.

Question 4

What does it cost to build a custom AI voice agent?

Accepted Answer

A single-intent agent (one job, like appointment booking or order status lookup) on a managed platform like Vapi or Retell costs $8,000 to $15,000 in build time. A multi-intent agent with 3 to 5 skills, CRM integration, and call routing costs $20,000 to $40,000. A complex agent with custom tool use, knowledge base retrieval, multi-language support, and analytics dashboard runs $50,000 to $100,000+. Most of the build cost is not the AI - it is the integrations, edge case handling, and prompt iteration.

Question 5

When does an AI voice agent pay for itself?

Accepted Answer

The simple math: if your current cost per call is $4 (typical for outsourced support) and your AI agent handles a call for $0.80 all-in, you save $3.20 per call. A $30,000 build pays back at roughly 9,375 calls. At 1,000 calls per month, that is 9 to 10 months. At 5,000 calls per month, payback is under 2 months. Agents make sense when call volume is high, the call patterns are repetitive, and you have measurable per-call labor cost to displace.

Question 6

What is the typical latency for a production voice agent?

Accepted Answer

In 2026, well-tuned voice agents hit 600 to 1,200 milliseconds end-to-end (caller stops talking, agent starts responding). Below 600ms feels instant. 1,200 to 2,000ms feels noticeably slow but tolerable. Above 2,000ms, callers start interrupting. Latency comes from telephony (50 to 150ms), STT (200 to 400ms with streaming), LLM time-to-first-token (200 to 600ms), and TTS first chunk (100 to 300ms). Smaller models, streaming everywhere, and aggressive prefetching are the main levers.

Question 7

Can I use OpenAI Realtime API or Gemini Live instead of stitching components together?

Accepted Answer

Yes, and in 2026 the realtime speech-to-speech APIs from OpenAI and Google are production-viable for many use cases. They collapse STT, LLM, and TTS into a single streaming connection, which simplifies the pipeline and cuts latency to under 500ms. The trade-off is cost (roughly $0.06 per minute audio input, $0.24 per minute audio output for OpenAI Realtime as of 2026) and less control over individual steps. For simple agents under 100,000 minutes per month, the realtime APIs are often the right call. For high-volume or cost-sensitive deployments, the stitched pipeline still wins.

Question 8

Do I need a custom voice or can I use stock TTS voices?

Accepted Answer

Most production agents use stock voices from ElevenLabs, OpenAI, or Cartesia. Stock voices are excellent in 2026, and callers usually cannot tell. A custom cloned voice costs $99 to $1,200 per month depending on the provider plan, plus a few hours of source audio. Brands with strong voice identity (telehealth, premium consumer apps) sometimes invest. Most B2B and operational agents do not need it.

AI Voice Agent Cost Calculator

Voice agent specs

Cost breakdown

Related tools

I know which AI tools are worth your time.

Frequently asked questions