Action tools
After the AI knows what’s on the page (see perception tools), it can act on it through 9 action tools. Five are the modern v2 namespace (act.*); four are legacy flat tools kept as fallback.
Navigation
navigate({ url, external? })
Move to a different page.
{ "url": "/pricing" } // in-page navigation (same origin){ "url": "https://docs.example.com", "external": true } // opens in new tab- Same-origin URLs use SPA-aware navigation (
pushStateif supported, full reload otherwise). The voice session continues seamlessly on the new page. external: trueopens cross-origin URLs in a new tab. The voice session stays on the original page. Popup blockers are handled — ifwindow.openreturns null, the widget falls back to a synthesized anchor click (which inherits the user gesture from the voice command).
When called: visitor says “take me to pricing”, “go to the contact page”, “open the docs”.
Returns: "Navigated to /pricing" or "Opened https://... in a new tab".
scroll_to({ target })
Scroll a named section into view. The AI picks target from the page’s section ids (which read_page and site_intelligence expose).
{ "target": "pricing" }When called: “scroll to the pricing section”, “go to the FAQ”, “take me to the contact form”.
Returns: "Scrolled to pricing" or "Could not find pricing".
Side effect: if a section heading is found, the widget mounts a green flash highlight on the section (or a persistent highlight + auto-scroll if the section is taller than the viewport — same handoff as read_section).
scroll_by({ direction, percent })
Vague directional scrolling — scroll up or down by a percentage of the viewport.
{ "direction": "down", "percent": 100 } // one full screen{ "direction": "up", "percent": 50 } // half a screenThe AI maps natural-language phrasing to percentages:
| Visitor said | percent |
|---|---|
| ”a little / a bit / slightly” | 25 |
| (bare) “scroll down” | 100 |
| ”halfway” | 50 |
| ”a lot / way down / far” | 200 |
| ”down 30 percent” | 30 |
Capped at 500% per call.
act.scroll_to({ id })
v2 — scroll an element identified by snapshot id (from see.snapshot).
{ "id": "sp-42" }Inherits the same sticky-header offset + tall-section live-reading handoff as the legacy scroll_to. Use when the visitor names a specific element rather than a section (“scroll to the third FAQ”, “show me the orange button”).
Click & fill
act.click({ id }) — v2 preferred
Click an element by snapshot id.
{ "id": "sp-12" }Returns: "Clicked sp-12 (Book a table)" or [error: no element with snapshot id "sp-12" — page may have changed; call see.snapshot again].
The error message tells the AI exactly what to do on failure (re-snapshot). Stable element addressing is the main advantage over click_element — no fuzzy text matching, no ambiguity when there are multiple elements with the same visible label.
act.fill({ id, value }) — v2 preferred
Fill an input, textarea, or select by snapshot id.
{ "id": "sp-23", "value": "Saturday 7pm" }Handles all field types:
<input>/<textarea>— uses the proto-level value setter so React-controlled fields don’t get out of sync. Firesinput+changeevents so the page’s onChange handlers run.<select>— matches the option by visible text first ("Saturday 7pm"→ finds the option with that text), then falls back to matching by option value. Fireschangeonly.
Returns: "Filled sp-23 with \"Saturday 7pm\"" or "Selected \"Saturday 7pm\" in sp-23".
click_element({ text }) — legacy fallback
Click a button or link by visible text. Used when no snapshot has been taken or the target is unambiguous.
{ "text": "Submit" }Match strategy: exact match (score 100) → word-boundary match (50) → substring (10). Among ties, prefers visible elements.
After filling form fields with fill_field / act.fill, the system prompt always instructs the AI to call click_element (or act.click) on the submit/save/confirm button to finish the task — the AI doesn’t leave the form half-filled.
fill_field({ field, value }) — legacy fallback
Fill a form field by label, placeholder, or name.
{ "field": "city", "value": "West Hollywood" }Partial matches work — "city" will match <input placeholder="Filter by city"> even if the visitor said “set city to West Hollywood”. The handler delegates to fillField() in packages/spelo-system/src/navigate.ts, which also fires input + change events.
For dropdowns, pass the option text ("West Hollywood") rather than the value ("WH") — the handler does the lookup.
Destructive action gate
act.confirm({ summary })
Display a Yes/No overlay on the page with a one-sentence summary of the action the AI is about to take.
{ "summary": "Submit your contact info to the team?" }Returns: "yes" or "no" — the AI is required by its system prompt to branch on the result and abort if "no".
Visual: a small dark pill near the agent’s MorphPanel, two buttons. Auto-focuses the No button so an accidental Enter cancels rather than confirms.
When the system prompt requires it: any destructive or commit-to-it action — submit a form, complete a checkout step, send a message, place an order, schedule something, share contact info. The CONFIRMATION POLICY in the prompt enumerates these.
This is the safety rail that lets act.click exist at all. Without act.confirm the agent could submit a form mid-conversation without the visitor realizing — with act.confirm, the visitor always gets a final visual + audible “are you sure?” before anything irreversible happens.
Reliability notes
The legacy and v2 tools coexist by design. The system prompt instructs the AI:
Always prefer
see.snapshot+act.*for actions on specific elements. Use the legacyclick_element/fill_fieldonly when:
- You haven’t taken a snapshot yet (e.g. the visitor’s first request on a new page)
- The target is unambiguously named (a single “Submit” button on a small form)
- The legacy tool is cheaper (e.g.
scroll_by 50%is just a hint — no snapshot needed)
This dual-mode keeps the bundle small (one set of handlers per tool) while letting the LLM pick the right precision/cost trade-off per task.
Performance and limits
| Tool | Wall-clock cost | Network |
|---|---|---|
navigate (in-page) | ~20-50 ms | None |
navigate (external) | ~50 ms (popup blocker fallback) | None |
scroll_to, act.scroll_to | ~30-60 ms | None |
scroll_by | <10 ms | None |
act.click, click_element | ~10-30 ms | None |
act.fill, fill_field | ~10-30 ms | None |
act.confirm | wait for user click (1-5 s) | None |
All action handlers run in the browser. No server round-trips.
See also
- Page perception tools —
see.snapshotis the prerequisite for everyact.*call - Knowledge & lifecycle tools —
submit_lead,end_call,set_flow_state - Customize → Personality — tune the CONFIRMATION POLICY for your domain