Skip to content
GitHub
Get started →

Action tools

After the AI knows what’s on the page (see perception tools), it can act on it through 9 action tools. Five are the modern v2 namespace (act.*); four are legacy flat tools kept as fallback.

Move to a different page.

{ "url": "/pricing" } // in-page navigation (same origin)
{ "url": "https://docs.example.com", "external": true } // opens in new tab
  • Same-origin URLs use SPA-aware navigation (pushState if supported, full reload otherwise). The voice session continues seamlessly on the new page.
  • external: true opens cross-origin URLs in a new tab. The voice session stays on the original page. Popup blockers are handled — if window.open returns null, the widget falls back to a synthesized anchor click (which inherits the user gesture from the voice command).

When called: visitor says “take me to pricing”, “go to the contact page”, “open the docs”.

Returns: "Navigated to /pricing" or "Opened https://... in a new tab".

scroll_to({ target })

Scroll a named section into view. The AI picks target from the page’s section ids (which read_page and site_intelligence expose).

{ "target": "pricing" }

When called: “scroll to the pricing section”, “go to the FAQ”, “take me to the contact form”.

Returns: "Scrolled to pricing" or "Could not find pricing".

Side effect: if a section heading is found, the widget mounts a green flash highlight on the section (or a persistent highlight + auto-scroll if the section is taller than the viewport — same handoff as read_section).

scroll_by({ direction, percent })

Vague directional scrolling — scroll up or down by a percentage of the viewport.

{ "direction": "down", "percent": 100 } // one full screen
{ "direction": "up", "percent": 50 } // half a screen

The AI maps natural-language phrasing to percentages:

Visitor saidpercent
”a little / a bit / slightly”25
(bare) “scroll down”100
”halfway”50
”a lot / way down / far”200
”down 30 percent”30

Capped at 500% per call.

act.scroll_to({ id })

v2 — scroll an element identified by snapshot id (from see.snapshot).

{ "id": "sp-42" }

Inherits the same sticky-header offset + tall-section live-reading handoff as the legacy scroll_to. Use when the visitor names a specific element rather than a section (“scroll to the third FAQ”, “show me the orange button”).

Click & fill

act.click({ id }) — v2 preferred

Click an element by snapshot id.

{ "id": "sp-12" }

Returns: "Clicked sp-12 (Book a table)" or [error: no element with snapshot id "sp-12" — page may have changed; call see.snapshot again].

The error message tells the AI exactly what to do on failure (re-snapshot). Stable element addressing is the main advantage over click_element — no fuzzy text matching, no ambiguity when there are multiple elements with the same visible label.

act.fill({ id, value }) — v2 preferred

Fill an input, textarea, or select by snapshot id.

{ "id": "sp-23", "value": "Saturday 7pm" }

Handles all field types:

  • <input> / <textarea> — uses the proto-level value setter so React-controlled fields don’t get out of sync. Fires input + change events so the page’s onChange handlers run.
  • <select> — matches the option by visible text first ("Saturday 7pm" → finds the option with that text), then falls back to matching by option value. Fires change only.

Returns: "Filled sp-23 with \"Saturday 7pm\"" or "Selected \"Saturday 7pm\" in sp-23".

click_element({ text }) — legacy fallback

Click a button or link by visible text. Used when no snapshot has been taken or the target is unambiguous.

{ "text": "Submit" }

Match strategy: exact match (score 100) → word-boundary match (50) → substring (10). Among ties, prefers visible elements.

After filling form fields with fill_field / act.fill, the system prompt always instructs the AI to call click_element (or act.click) on the submit/save/confirm button to finish the task — the AI doesn’t leave the form half-filled.

fill_field({ field, value }) — legacy fallback

Fill a form field by label, placeholder, or name.

{ "field": "city", "value": "West Hollywood" }

Partial matches work — "city" will match <input placeholder="Filter by city"> even if the visitor said “set city to West Hollywood”. The handler delegates to fillField() in packages/spelo-system/src/navigate.ts, which also fires input + change events.

For dropdowns, pass the option text ("West Hollywood") rather than the value ("WH") — the handler does the lookup.

Destructive action gate

act.confirm({ summary })

Display a Yes/No overlay on the page with a one-sentence summary of the action the AI is about to take.

{ "summary": "Submit your contact info to the team?" }

Returns: "yes" or "no" — the AI is required by its system prompt to branch on the result and abort if "no".

Visual: a small dark pill near the agent’s MorphPanel, two buttons. Auto-focuses the No button so an accidental Enter cancels rather than confirms.

When the system prompt requires it: any destructive or commit-to-it action — submit a form, complete a checkout step, send a message, place an order, schedule something, share contact info. The CONFIRMATION POLICY in the prompt enumerates these.

This is the safety rail that lets act.click exist at all. Without act.confirm the agent could submit a form mid-conversation without the visitor realizing — with act.confirm, the visitor always gets a final visual + audible “are you sure?” before anything irreversible happens.

Reliability notes

The legacy and v2 tools coexist by design. The system prompt instructs the AI:

Always prefer see.snapshot + act.* for actions on specific elements. Use the legacy click_element / fill_field only when:

  • You haven’t taken a snapshot yet (e.g. the visitor’s first request on a new page)
  • The target is unambiguously named (a single “Submit” button on a small form)
  • The legacy tool is cheaper (e.g. scroll_by 50% is just a hint — no snapshot needed)

This dual-mode keeps the bundle small (one set of handlers per tool) while letting the LLM pick the right precision/cost trade-off per task.

Performance and limits

ToolWall-clock costNetwork
navigate (in-page)~20-50 msNone
navigate (external)~50 ms (popup blocker fallback)None
scroll_to, act.scroll_to~30-60 msNone
scroll_by<10 msNone
act.click, click_element~10-30 msNone
act.fill, fill_field~10-30 msNone
act.confirmwait for user click (1-5 s)None

All action handlers run in the browser. No server round-trips.

See also