Skip to content
GitHub
Get started →

Slow responses

Good voice UX feels instant. “Slow” means noticeable pauses after the user speaks, or long silences before the AI responds. The typical causes, in descending frequency:

1. Slow database queries

The AI calls search_database, which hits your backend. If your DB takes 5 seconds, the AI’s response is delayed 5 seconds.

Diagnose: Dashboard → AnalyticsQuery log. Look at the duration_ms column. Anything above 500ms is suspicious; above 1000ms is bad.

Fix:

  • Add indexes on filter and sort columns (the most common cause of slow queries)
  • Trim display_fields — returning 50 columns when you need 5 wastes transfer time
  • Use a read replica if your primary DB is write-heavy
  • Cache common queries in a warm edge (Upstash Redis, Cloudflare KV) via a webhook adapter

2. Slow webhook responses

If you use the webhook adapter, your endpoint is the critical path.

Diagnose: Time your endpoint with curl -w "Total: %{time_total}\n". Anything above 500ms is a problem.

Fix:

  • Move the endpoint to the same region as Spelo (us-east-1 by default)
  • Cache aggressively — voice queries often repeat (“what’s on the menu?”)
  • Use an in-memory index (e.g. pre-loaded search tree) if your data fits in RAM
  • Return partial results faster if you can — up to you whether to trade completeness for speed

3. Long system prompts

The OpenAI Realtime API has to parse the system prompt on every session start. Very long prompts (2,000+ words) delay the first utterance.

Diagnose: Dashboard → VoicePreview prompt → check token count (we estimate and show it).

Fix:

  • Trim the personality; verbose preamble doesn’t help the AI sound better
  • Reduce the pronunciation dictionary — cap at the ~100 most common words
  • Move reference data out of the prompt and into the database (searchable via search_database)

4. Network latency

Visitors on slow networks (3G, congested Wi-Fi) see higher round-trip times to OpenAI’s servers. The widget can’t compensate for this.

Diagnose: Ask the user to run speedtest.net or check their navigator.connection.effectiveType.

Fix: Not much. Voice inherently needs ~100ms latency; slower networks degrade quality. You can tell users “a Wi-Fi connection works best.”

5. Too many function calls in sequence

If the AI decides to chain multiple function calls before speaking (e.g. navigate -> scroll_to -> search_database), each adds latency. Total perceived delay = sum of all.

Diagnose: Dashboard → AnalyticsFunction call chains. Look for sessions with 4+ consecutive function calls.

Fix: Tighten the personality prompt to prefer fewer actions per turn:

Prefer speaking first, then acting. Only navigate or scroll if the user explicitly asks to go somewhere.
Call search_database at most once per turn. If you need more data, ask the user a follow-up instead.

6. Cold start of the adapter

The first query of the day against a Postgres adapter has to establish a connection pool. That adds 200–500ms once.

Diagnose: The first query after idle is slow; subsequent are fast.

Fix: The adapter caches pools per-site for 5 minutes of inactivity. For 24/7 low-volume sites, add a synthetic heartbeat in your system (e.g. a cron that runs SELECT 1 every 4 minutes).

7. OpenAI rate limiting

Your OpenAI account hit a rate limit. Sessions queue or fail.

Diagnose: Dashboard → Analytics → look for openai_error events with rate_limit_exceeded.

Fix: Upgrade your OpenAI tier. Tier progression: platform.openai.com/docs/guides/rate-limits.

8. Slow DOM scraping on large pages

On very long pages (10,000+ DOM nodes), the widget’s initial scrape can take 300ms. It’s a one-time cost at page load, not per query.

Diagnose: Check the widget’s init log: [Spelo] scraped N nodes in Xms.

Fix: Add data-spelo-skip="true" attributes to large, AI-irrelevant sections (footers with 500 links, ads, long blog comments). The scraper will skip them.

What “fast” feels like

The Realtime API’s baseline latency is ~400ms from end-of-user-speech to start-of-AI-speech on a good network. We can’t improve that much. A well-tuned setup with local DB + short prompt + simple queries lands at 600–800ms user-perceived — which feels instant.

If you’re seeing 2+ seconds consistently, something above is wrong.

Async pre-warming

For high-value pages where you expect voice interaction (a product page for your flagship product), you can pre-initialize:

<script src="https://spelo.ai/spelo.js"
data-site-id="abc123"
data-prewarm="true"
async></script>

With data-prewarm="true", the widget opens the WebRTC connection in advance (on page load, not on click). The user’s first click is instant. Costs: nothing on quiet sites, but triples session counts on active sites (sessions ≤ 5s are still discarded, but prewarm ones last longer). Use sparingly.

See also