Skip to content
GitHub
Get started →

Cost optimization

Voice is billed per minute of audio (OpenAI Realtime is roughly 6 cents per minute of input + 24 cents per minute of output, at retail). On a high-volume site (a few thousand visitors per day, ~20% engagement) you’ll spend serious money if you don’t tune.

Here’s how to cut cost without hurting UX.

1. Client-side VAD — silence gating

The single biggest lever. About 30% of typical call time is silence — visitor thinking, looking around, reading. OpenAI’s Realtime API bills audio-input per second regardless of whether you’re saying anything.

Spelo ships an optional Silero VAD (Voice Activity Detection) layer in the browser. When enabled, the widget mutes the mic track during silence so OpenAI doesn’t bill those seconds. Sub-200 ms latency, transparent to the visitor.

To enable: Dashboard → your site → Settings → Voice activity → Client-side VAD: ON.

Cost impact: typically 20-30% reduction in input audio minutes. On a $1,000/mo bill that’s $200-300 saved.

2. BYOK — bring your own OpenAI key

If you have an OpenAI account with negotiated rates or volume discounts, plug your own key into Spelo. You pay OpenAI directly (at their rate) and Spelo bills you only for platform usage (a flat per-site fee).

To enable: Dashboard → your site → Settings → Bring your own key → paste your OpenAI API key.

Cost impact: depends on your OpenAI rate. Standard accounts save nothing. Enterprise OpenAI customers with negotiated rates can save 20-40%.

See BYOK for the full architecture (how the key is stored, scope-restriction, revocation).

3. Restricted topics — block off-topic conversations

If your AI keeps getting dragged into long discussions about unrelated topics (visitors asking the dental clinic’s voice agent about football scores), those minutes still bill you.

Configure: Dashboard → your site → Settings → Restricted topics.

- Sports scores and statistics
- Political discussion
- Recipes and cooking
- Movie / TV recommendations
- Personal advice

The AI gets these in the system prompt as hard-blocks: “If the visitor asks about [topic], say ‘I’m here to help with [your business] — for that you’ll want to ask elsewhere’ and steer back.” See Restricted topics for full details.

Cost impact: depends on your visitor mix. Sites in industries that attract chatty visitors (lonely-hearts, hobby forums) see 10-20% reduction.

4. Conversation length caps

The longer a conversation runs, the larger the context window grows, and the more expensive each token round-trip becomes (model bills per cached + uncached input tokens).

Set a soft cap in your personality prompt:

After ~10 turns, naturally wrap up the conversation with a summary and call to action (book a call, fill the form, schedule a visit).
Don't keep the call open indefinitely.

The AI is good at honoring soft caps. For hard caps:

  • Idle timeout — Dashboard → Settings → Idle handling → auto-disconnect after N seconds of silence
  • Max session length — set in Settings → Session settings → Max minutes per session

Cost impact: typically 5-15%, depending on how chatty your visitors are.

5. Disabled pages — don’t load on irrelevant routes

If your site has high-traffic pages where voice doesn’t add value (about page, blog, terms, privacy), disable the orb on those routes. The widget never initializes there → zero cost.

Configure: Dashboard → your site → Settings → Enabled / disabled pages.

disabled_pages:
- /blog/*
- /press/*
- /careers/*
- /terms
- /privacy

See Enabled/disabled pages for pattern syntax.

Cost impact: indirect — fewer page loads means fewer accidental orb clicks. On content-heavy sites this can save 30-50% of total minutes by routing voice to high-intent pages only (pricing, contact, product detail).

6. Tighten the personality prompt

Long system prompts get cached after the first request (OpenAI caches at 1024-token granularity), but every uncached chunk re-bills on every model call.

Symptoms of an over-long prompt:

  • Spelo dashboard → analytics → System prompt tokens > 3000
  • Visible latency on first response (~1s+ before the agent starts speaking)
  • High input-token bill on the OpenAI dashboard

Fixes:

  • Trim verbose personality descriptions
  • Move FAQ-style content out of the prompt and into your knowledge base (crawl it; the AI calls search_knowledge_base on demand instead of carrying it in context)
  • Move structured data out of the prompt and into a database adapter (callable via search_database on DFY tier or via REST /query)
  • Use pronunciation dictionary only for genuinely-mispronounced terms — every entry is in the prompt

Aim for a system prompt under 1500 tokens. The cached portion is essentially free, so the unique-per-call portion is what matters.

7. Voice choice

Some voices cost the same per-minute regardless of which one you pick (true for OpenAI Realtime today). But Gemini voices, on Spelo’s voice-relay transport, can have different pricing per voice family.

If you’re on the voice-relay / Gemini path, check the voices reference for current per-voice pricing. Picking a cheaper voice family can save 10-20% with no UX impact for most use cases.

8. End the call cleanly

The end_call tool exists so the agent can terminate the WebRTC connection the instant the visitor says “bye.” Without it, the connection runs until the silence timeout fires (typically 30s) — that’s 30 wasted seconds per call.

Verify it’s wired: Dashboard → analytics → Calls ended by. If “tool-triggered end_call” is < 60% of your call closures, the system prompt or transport isn’t routing the tool correctly. See Knowledge & lifecycle tools — end_call for the architecture.

Cost impact: ~0.5-1.5% — small but free.

What to do first

Site profileStart with
Any site, any volumeClient-side VAD (#1) — universal 20-30% savings
High off-topic chatterRestricted topics (#3) + tighter personality prompt (#6)
Long average conversationsConversation length caps (#4)
Lots of low-intent page trafficDisabled pages (#5)
You have OpenAI volume pricingBYOK (#2)
Voice is on every page including blog/aboutDisabled pages (#5) is the biggest win

Tracking results

Use the analytics endpoint to compare cost-per-conversation before and after each change. Group by started_at week-by-week to see the trend:

Terminal window
curl "https://api.spelo.ai/v1/analytics?from=2026-03-01&to=2026-05-01&group_by=week" \
-H "Authorization: Bearer vk_live_..."

Returned fields include total_minutes, total_calls, avg_minutes_per_call, and (if BYOK is configured) estimated_openai_cost.

See also