Slow responses
Good voice UX feels instant. “Slow” means noticeable pauses after the user speaks, or long silences before the AI responds. The typical causes, in descending frequency:
1. Slow database queries
The AI calls search_database, which hits your backend. If your DB takes 5 seconds, the AI’s response is delayed 5 seconds.
Diagnose: Dashboard → Analytics → Query log. Look at the duration_ms column. Anything above 500ms is suspicious; above 1000ms is bad.
Fix:
- Add indexes on filter and sort columns (the most common cause of slow queries)
- Trim
display_fields— returning 50 columns when you need 5 wastes transfer time - Use a read replica if your primary DB is write-heavy
- Cache common queries in a warm edge (Upstash Redis, Cloudflare KV) via a webhook adapter
2. Slow webhook responses
If you use the webhook adapter, your endpoint is the critical path.
Diagnose: Time your endpoint with curl -w "Total: %{time_total}\n". Anything above 500ms is a problem.
Fix:
- Move the endpoint to the same region as Spelo (us-east-1 by default)
- Cache aggressively — voice queries often repeat (“what’s on the menu?”)
- Use an in-memory index (e.g. pre-loaded search tree) if your data fits in RAM
- Return partial results faster if you can — up to you whether to trade completeness for speed
3. Long system prompts
The OpenAI Realtime API has to parse the system prompt on every session start. Very long prompts (2,000+ words) delay the first utterance.
Diagnose: Dashboard → Voice → Preview prompt → check token count (we estimate and show it).
Fix:
- Trim the personality; verbose preamble doesn’t help the AI sound better
- Reduce the pronunciation dictionary — cap at the ~100 most common words
- Move reference data out of the prompt and into the database (searchable via
search_database)
4. Network latency
Visitors on slow networks (3G, congested Wi-Fi) see higher round-trip times to OpenAI’s servers. The widget can’t compensate for this.
Diagnose: Ask the user to run speedtest.net or check their navigator.connection.effectiveType.
Fix: Not much. Voice inherently needs ~100ms latency; slower networks degrade quality. You can tell users “a Wi-Fi connection works best.”
5. Too many function calls in sequence
If the AI decides to chain multiple function calls before speaking (e.g. navigate -> scroll_to -> search_database), each adds latency. Total perceived delay = sum of all.
Diagnose: Dashboard → Analytics → Function call chains. Look for sessions with 4+ consecutive function calls.
Fix: Tighten the personality prompt to prefer fewer actions per turn:
Prefer speaking first, then acting. Only navigate or scroll if the user explicitly asks to go somewhere.Call search_database at most once per turn. If you need more data, ask the user a follow-up instead.6. Cold start of the adapter
The first query of the day against a Postgres adapter has to establish a connection pool. That adds 200–500ms once.
Diagnose: The first query after idle is slow; subsequent are fast.
Fix: The adapter caches pools per-site for 5 minutes of inactivity. For 24/7 low-volume sites, add a synthetic heartbeat in your system (e.g. a cron that runs SELECT 1 every 4 minutes).
7. OpenAI rate limiting
Your OpenAI account hit a rate limit. Sessions queue or fail.
Diagnose: Dashboard → Analytics → look for openai_error events with rate_limit_exceeded.
Fix: Upgrade your OpenAI tier. Tier progression: platform.openai.com/docs/guides/rate-limits.
8. Slow DOM scraping on large pages
On very long pages (10,000+ DOM nodes), the widget’s initial scrape can take 300ms. It’s a one-time cost at page load, not per query.
Diagnose: Check the widget’s init log: [Spelo] scraped N nodes in Xms.
Fix: Add data-spelo-skip="true" attributes to large, AI-irrelevant sections (footers with 500 links, ads, long blog comments). The scraper will skip them.
What “fast” feels like
The Realtime API’s baseline latency is ~400ms from end-of-user-speech to start-of-AI-speech on a good network. We can’t improve that much. A well-tuned setup with local DB + short prompt + simple queries lands at 600–800ms user-perceived — which feels instant.
If you’re seeing 2+ seconds consistently, something above is wrong.
Async pre-warming
For high-value pages where you expect voice interaction (a product page for your flagship product), you can pre-initialize:
<script src="https://spelo.ai/spelo.js" data-site-id="abc123" data-prewarm="true" async></script>With data-prewarm="true", the widget opens the WebRTC connection in advance (on page load, not on click). The user’s first click is instant. Costs: nothing on quiet sites, but triples session counts on active sites (sessions ≤ 5s are still discarded, but prewarm ones last longer). Use sparingly.
See also
- Query endpoint — performance knobs
- Custom adapter — how to build a low-latency adapter
- Analytics endpoint — instrument everything