Cost optimization
Voice is billed per minute of audio (OpenAI Realtime is roughly 6 cents per minute of input + 24 cents per minute of output, at retail). On a high-volume site (a few thousand visitors per day, ~20% engagement) you’ll spend serious money if you don’t tune.
Here’s how to cut cost without hurting UX.
1. Client-side VAD — silence gating
The single biggest lever. About 30% of typical call time is silence — visitor thinking, looking around, reading. OpenAI’s Realtime API bills audio-input per second regardless of whether you’re saying anything.
Spelo ships an optional Silero VAD (Voice Activity Detection) layer in the browser. When enabled, the widget mutes the mic track during silence so OpenAI doesn’t bill those seconds. Sub-200 ms latency, transparent to the visitor.
To enable: Dashboard → your site → Settings → Voice activity → Client-side VAD: ON.
Cost impact: typically 20-30% reduction in input audio minutes. On a $1,000/mo bill that’s $200-300 saved.
2. BYOK — bring your own OpenAI key
If you have an OpenAI account with negotiated rates or volume discounts, plug your own key into Spelo. You pay OpenAI directly (at their rate) and Spelo bills you only for platform usage (a flat per-site fee).
To enable: Dashboard → your site → Settings → Bring your own key → paste your OpenAI API key.
Cost impact: depends on your OpenAI rate. Standard accounts save nothing. Enterprise OpenAI customers with negotiated rates can save 20-40%.
See BYOK for the full architecture (how the key is stored, scope-restriction, revocation).
3. Restricted topics — block off-topic conversations
If your AI keeps getting dragged into long discussions about unrelated topics (visitors asking the dental clinic’s voice agent about football scores), those minutes still bill you.
Configure: Dashboard → your site → Settings → Restricted topics.
- Sports scores and statistics- Political discussion- Recipes and cooking- Movie / TV recommendations- Personal adviceThe AI gets these in the system prompt as hard-blocks: “If the visitor asks about [topic], say ‘I’m here to help with [your business] — for that you’ll want to ask elsewhere’ and steer back.” See Restricted topics for full details.
Cost impact: depends on your visitor mix. Sites in industries that attract chatty visitors (lonely-hearts, hobby forums) see 10-20% reduction.
4. Conversation length caps
The longer a conversation runs, the larger the context window grows, and the more expensive each token round-trip becomes (model bills per cached + uncached input tokens).
Set a soft cap in your personality prompt:
After ~10 turns, naturally wrap up the conversation with a summary and call to action (book a call, fill the form, schedule a visit).Don't keep the call open indefinitely.The AI is good at honoring soft caps. For hard caps:
- Idle timeout — Dashboard → Settings → Idle handling → auto-disconnect after N seconds of silence
- Max session length — set in Settings → Session settings → Max minutes per session
Cost impact: typically 5-15%, depending on how chatty your visitors are.
5. Disabled pages — don’t load on irrelevant routes
If your site has high-traffic pages where voice doesn’t add value (about page, blog, terms, privacy), disable the orb on those routes. The widget never initializes there → zero cost.
Configure: Dashboard → your site → Settings → Enabled / disabled pages.
disabled_pages: - /blog/* - /press/* - /careers/* - /terms - /privacySee Enabled/disabled pages for pattern syntax.
Cost impact: indirect — fewer page loads means fewer accidental orb clicks. On content-heavy sites this can save 30-50% of total minutes by routing voice to high-intent pages only (pricing, contact, product detail).
6. Tighten the personality prompt
Long system prompts get cached after the first request (OpenAI caches at 1024-token granularity), but every uncached chunk re-bills on every model call.
Symptoms of an over-long prompt:
- Spelo dashboard → analytics → System prompt tokens > 3000
- Visible latency on first response (~1s+ before the agent starts speaking)
- High input-token bill on the OpenAI dashboard
Fixes:
- Trim verbose personality descriptions
- Move FAQ-style content out of the prompt and into your knowledge base (crawl it; the AI calls
search_knowledge_baseon demand instead of carrying it in context) - Move structured data out of the prompt and into a database adapter (callable via
search_databaseon DFY tier or via REST/query) - Use pronunciation dictionary only for genuinely-mispronounced terms — every entry is in the prompt
Aim for a system prompt under 1500 tokens. The cached portion is essentially free, so the unique-per-call portion is what matters.
7. Voice choice
Some voices cost the same per-minute regardless of which one you pick (true for OpenAI Realtime today). But Gemini voices, on Spelo’s voice-relay transport, can have different pricing per voice family.
If you’re on the voice-relay / Gemini path, check the voices reference for current per-voice pricing. Picking a cheaper voice family can save 10-20% with no UX impact for most use cases.
8. End the call cleanly
The end_call tool exists so the agent can terminate the WebRTC connection the instant the visitor says “bye.” Without it, the connection runs until the silence timeout fires (typically 30s) — that’s 30 wasted seconds per call.
Verify it’s wired: Dashboard → analytics → Calls ended by. If “tool-triggered end_call” is < 60% of your call closures, the system prompt or transport isn’t routing the tool correctly. See Knowledge & lifecycle tools — end_call for the architecture.
Cost impact: ~0.5-1.5% — small but free.
What to do first
| Site profile | Start with |
|---|---|
| Any site, any volume | Client-side VAD (#1) — universal 20-30% savings |
| High off-topic chatter | Restricted topics (#3) + tighter personality prompt (#6) |
| Long average conversations | Conversation length caps (#4) |
| Lots of low-intent page traffic | Disabled pages (#5) |
| You have OpenAI volume pricing | BYOK (#2) |
| Voice is on every page including blog/about | Disabled pages (#5) is the biggest win |
Tracking results
Use the analytics endpoint to compare cost-per-conversation before and after each change. Group by started_at week-by-week to see the trend:
curl "https://api.spelo.ai/v1/analytics?from=2026-03-01&to=2026-05-01&group_by=week" \ -H "Authorization: Bearer vk_live_..."Returned fields include total_minutes, total_calls, avg_minutes_per_call, and (if BYOK is configured) estimated_openai_cost.
See also
- Plans and limits — current pricing and per-tier features
- BYOK — bring your own OpenAI key
- Usage metering — how minutes are counted
- Restricted topics — hard-block off-topic chatter
- Enabled / disabled pages — exclude low-intent routes
- Idle handling — auto-disconnect knobs