W1
Week One Labs
5/26/2026

How Much Does an AI Voice Agent Cost in 2026? Real Per-Minute Numbers and Build Budgets

A grounded breakdown of what an AI voice agent actually costs in 2026: telephony, speech-to-text, LLM, and TTS per-minute pricing, build budgets by complexity, and the payback math against human-handled call volume.

How Much Does an AI Voice Agent Cost in 2026? Real Per-Minute Numbers and Build Budgets

Voice agents went from novelty to operational tooling in 2025. By mid 2026, the question founders ask is no longer can we build a voice agent. The question is how much will it cost to run one, and when does it pay for itself.

The honest answer requires four numbers: telephony cost per minute, speech-to-text cost per minute, LLM cost per minute, and text-to-speech cost per minute. Stacked together they give you a per-minute pipeline cost. Multiply by call volume, add a build budget, subtract what you currently pay humans to do the same work, and you have your payback period. I built a free AI voice agent cost calculator that runs this math for you in one screen.

The four costs that stack into every minute

Every voice agent in production today is a real-time pipeline with four billable components.

The first is telephony. The agent needs a phone number that callers can dial or that can dial out. Twilio is still the default in 2026: roughly $0.013 per minute for inbound US calls and $0.022 per minute for outbound. Telnyx and Vonage are close. Carrier costs are stable and predictable, and at the volume most startups operate they round to roughly two cents per minute either direction.

The second is speech-to-text. The agent has to transcribe what the caller says, ideally with streaming partial transcripts so the LLM can start thinking before the caller finishes speaking. Deepgram Nova-3 runs about $0.0040 per minute on streaming, AssemblyAI is in the same range, and OpenAI Whisper hosted is slightly higher. Streaming STT is non-negotiable for natural-feeling conversation, which rules out batch transcription services no matter how cheap they look.

The third is the LLM. This is the most variable cost. A simple intent classifier on Claude Haiku or GPT-4o mini runs pennies per call. A reasoning-heavy agent on Claude Opus 4.6 or GPT-5 with extended thinking can run $0.10 to $0.20 per minute on a chatty call. The right number depends on whether the agent is following a tight script (cheap) or reasoning about edge cases in real time (expensive). For most production support agents, a balanced tier like Claude Sonnet 4.6 lands around $0.04 to $0.06 per minute.

The fourth is text-to-speech. ElevenLabs Flash, OpenAI TTS, and Cartesia Sonic are all credible options in 2026. The math is roughly $0.06 per 1,000 characters spoken, which translates to about $0.02 per minute of audio at typical speaking pace. Cloned voices cost more but are usually unnecessary for B2B and operational agents.

Stack the four together and a typical balanced voice agent costs $0.08 to $0.12 per minute all-in on a managed platform like Vapi, $0.10 to $0.15 per minute on Retell, and $0.04 to $0.08 per minute if you roll your own pipeline on LiveKit and absorb the engineering investment. For a three-minute support call, that is $0.24 to $0.45 all-in.

What the build actually costs

Per-minute cost is the operating bill. Build cost is the one-time engineering investment to get the agent into production. Three rough tiers in 2026.

A single-intent agent that does one job (book an appointment, look up an order, qualify a lead) on a managed platform costs $8,000 to $15,000 to build. Most of that is not the prompt or the voice config. It is the integration to your scheduling system or CRM, the edge cases (callers who give partial information, who go off-script, who interrupt), and the iteration loop to tune voice persona and barge-in behavior. Even on a platform like Vapi or Retell that handles the real-time plumbing, a polished single-intent agent takes two to four weeks of focused engineering.

A multi-intent agent with three to five skills plus call routing plus CRM integration runs $20,000 to $40,000. The cost growth is not the skills themselves. It is the routing logic between them, the conversation state management across handoffs, and the analytics layer to see what is working. Most production B2B voice agents in 2026 land in this range.

A complex agent with deep skill trees, custom tool use, retrieval against a knowledge base, multi-language support, and a real analytics dashboard runs $50,000 to $100,000+. At this level you are no longer building one agent. You are building voice infrastructure for your company, and the budget reflects that.

Vapi, Retell, Bland, and the realtime APIs

The voice agent platform market consolidated around three orchestrators by 2026: Vapi, Retell, and Bland. All three handle the real-time pipeline (telephony plus STT plus LLM plus TTS) so you do not have to. They charge a per-minute platform markup, typically $0.05 to $0.10 on top of the underlying component costs.

The question is whether the platform markup is worth it. For most teams under 50,000 minutes per month, yes. The engineering investment to build a custom pipeline that handles latency, interruption detection, and barge-in is roughly $40,000 to $80,000 of upfront work plus ongoing maintenance. Platform fees of $0.05 to $0.10 per minute at 50,000 minutes is $2,500 to $5,000 per month, which is well below the carrying cost of an engineer maintaining a custom pipeline.

Above 50,000 minutes per month, the calculus flips. At 500,000 minutes per month, platform fees become $25,000 to $50,000 per month, which buys real engineering capacity. Most teams at that scale eventually move to LiveKit and a custom orchestration layer.

The wildcard in 2026 is the realtime speech-to-speech APIs from OpenAI and Google. OpenAI Realtime and Gemini Live collapse STT plus LLM plus TTS into a single streaming connection. End-to-end latency drops under 500ms, which is meaningfully better than the stitched pipeline. The trade-off is cost (roughly $0.18 per minute averaged across audio in and out for OpenAI Realtime) and less granular control over individual pipeline steps. For simple agents and prototypes, realtime APIs are often the fastest path to production. For high-volume or cost-sensitive deployments, the stitched pipeline still wins on unit economics.

The payback math

Voice agents pay for themselves when they replace expensive human-handled call volume. The simple model: if your current cost per call is $4 (typical for outsourced support at $20 per hour and three minutes per call) and your AI agent handles a call for $0.80 all-in, you save $3.20 per call.

A $30,000 build pays back at roughly 9,375 calls. At 1,000 calls per month, payback is 9 to 10 months. At 5,000 calls per month, payback is under 2 months. The threshold most teams find: voice agents are clear wins above about 2,000 monthly calls of repetitive volume. Below that, you are usually better off keeping humans on the calls and applying the agent budget to upstream demand generation instead.

Three failure modes show up in the payback math. The first is volume that is too low. A founder builds an agent for 200 calls per month, and the build cost simply never recovers. The second is call patterns that are not repetitive enough. Variance kills agent ROI because the engineering cost to handle the long tail is much higher than the build cost to handle the head. The third is over-engineering the agent: spending $80,000 to build a complex multi-skill agent when a $15,000 single-intent agent would have handled 80 percent of the calls.

What I recommend for a first voice agent

Start with a single-intent agent on a managed platform (Vapi or Retell). Pick the call type with the most volume and the lowest variance. Budget $15,000 for the build and 4 to 6 weeks of iteration. Use a balanced LLM tier (Claude Sonnet 4.6 or GPT-5 standard) so you do not blow your per-minute cost on reasoning you do not need.

Measure two things ruthlessly: containment rate (percentage of calls handled end-to-end without human handoff) and per-minute all-in cost. If containment lands above 60 percent and per-minute cost lands under $0.15, you have a working agent. Iterate from there.

If you want to model your specific numbers before committing to a build, the AI voice agent cost calculator walks through platform choice, LLM tier, call volume, and payback in one screen. Pair it with the AI agent cost calculator for non-voice agents and the chatbot ROI calculator for text-channel automation.

The numbers in this post will drift as the underlying APIs reprice (they always do). The framework will not: four stacked costs, a build budget by complexity, and a payback comparison against the human cost you displace. Run the math before you sign the SOW.

Stay ahead on AI.

I build with AI every day. I will send you what is worth knowing and what is not worth your time.

Free tools from Week One Labs

Estimate your build cost, timeline, and whether to build or buy - before you commit.