Slow responses

Good voice UX feels instant. “Slow” means noticeable pauses after the user speaks, or long silences before the AI responds. The typical causes, in descending frequency:

1. Slow database queries

The AI calls search_database, which hits your backend. If your DB takes 5 seconds, the AI’s response is delayed 5 seconds.

Diagnose: Dashboard → Analytics → Query log. Look at the duration_ms column. Anything above 500ms is suspicious; above 1000ms is bad.

Fix:

Add indexes on filter and sort columns (the most common cause of slow queries)
Trim display_fields — returning 50 columns when you need 5 wastes transfer time
Use a read replica if your primary DB is write-heavy
Cache common queries in a warm edge (Upstash Redis, Cloudflare KV) via a webhook adapter

2. Slow webhook responses

If you use the webhook adapter, your endpoint is the critical path.

Diagnose: Time your endpoint with curl -w "Total: %{time_total}\n". Anything above 500ms is a problem.

Fix:

Move the endpoint to the same region as Spelo (us-east-1 by default)
Cache aggressively — voice queries often repeat (“what’s on the menu?”)
Use an in-memory index (e.g. pre-loaded search tree) if your data fits in RAM
Return partial results faster if you can — up to you whether to trade completeness for speed

3. Long system prompts

The OpenAI Realtime API has to parse the system prompt on every session start. Very long prompts (2,000+ words) delay the first utterance.

Diagnose: Dashboard → Voice → Preview prompt → check token count (we estimate and show it).

Fix:

Trim the personality; verbose preamble doesn’t help the AI sound better
Reduce the pronunciation dictionary — cap at the ~100 most common words
Move reference data out of the prompt and into the database (searchable via search_database)

4. Network latency

Visitors on slow networks (3G, congested Wi-Fi) see higher round-trip times to OpenAI’s servers. The widget can’t compensate for this.

Diagnose: Ask the user to run speedtest.net or check their navigator.connection.effectiveType.

Fix: Not much. Voice inherently needs ~100ms latency; slower networks degrade quality. You can tell users “a Wi-Fi connection works best.”

5. Too many function calls in sequence

If the AI decides to chain multiple function calls before speaking (e.g. navigate -> scroll_to -> search_database), each adds latency. Total perceived delay = sum of all.

Diagnose: Dashboard → Analytics → Function call chains. Look for sessions with 4+ consecutive function calls.

Fix: Tighten the personality prompt to prefer fewer actions per turn:

Prefer speaking first, then acting. Only navigate or scroll if the user explicitly asks to go somewhere.
Call search_database at most once per turn. If you need more data, ask the user a follow-up instead.

6. Cold start of the adapter

The first query of the day against a Postgres adapter has to establish a connection pool. That adds 200–500ms once.

Diagnose: The first query after idle is slow; subsequent are fast.

Fix: The adapter caches pools per-site for 5 minutes of inactivity. For 24/7 low-volume sites, add a synthetic heartbeat in your system (e.g. a cron that runs SELECT 1 every 4 minutes).

7. OpenAI rate limiting

Your OpenAI account hit a rate limit. Sessions queue or fail.

Diagnose: Dashboard → Analytics → look for openai_error events with rate_limit_exceeded.

Fix: Upgrade your OpenAI tier. Tier progression: platform.openai.com/docs/guides/rate-limits.

8. Slow DOM scraping on large pages

On very long pages (10,000+ DOM nodes), the widget’s initial scrape can take 300ms. It’s a one-time cost at page load, not per query.

Diagnose: Check the widget’s init log: [Spelo] scraped N nodes in Xms.

Fix: Add data-spelo-skip="true" attributes to large, AI-irrelevant sections (footers with 500 links, ads, long blog comments). The scraper will skip them.

What “fast” feels like

The Realtime API’s baseline latency is ~400ms from end-of-user-speech to start-of-AI-speech on a good network. We can’t improve that much. A well-tuned setup with local DB + short prompt + simple queries lands at 600–800ms user-perceived — which feels instant.

If you’re seeing 2+ seconds consistently, something above is wrong.

Async pre-warming

For high-value pages where you expect voice interaction (a product page for your flagship product), you can pre-initialize:

<script src="https://spelo.ai/spelo.js"
        data-site-id="abc123"
        data-prewarm="true"
        async></script>

With data-prewarm="true", the widget opens the WebRTC connection in advance (on page load, not on click). The user’s first click is instant. Costs: nothing on quiet sites, but triples session counts on active sites (sessions ≤ 5s are still discarded, but prewarm ones last longer). Use sparingly.

Slow responses

1. Slow database queries

2. Slow webhook responses

3. Long system prompts

4. Network latency

5. Too many function calls in sequence

6. Cold start of the adapter

7. OpenAI rate limiting

8. Slow DOM scraping on large pages

What “fast” feels like

Async pre-warming

See also