Page perception tools

Four tools let the AI read what the visitor is currently looking at. The widget’s bundle exposes these so the model can answer “what does this say?” without preloading every page of your site into the prompt.

`read_page` — full page scrape

Returns the title, meta description, every heading, navigation links, visible images (alt text + nearest heading), products if any, ctas, and up to 8 000 chars of main body text.

When the AI calls it: any time the visitor asks a content question about the current page — “what is this article about?”, “summarize this”, “tell me about this product”, “what’s on this page?”.

Parameters: none.

Returns: a structured text block:

PAGE: /menu
TITLE: Dinner Menu — Ember & Oak

HEADINGS:
H1: Dinner Menu
H2: Starters
H2: Entrees
H2: Desserts
H2: Wine list

SECTIONS:
[starters] Starters — Roasted carrot soup; Heirloom tomato salad; ...
[entrees] Entrees — Pan-seared trout; Braised short rib; ...

NAVIGATION:
/ Home
/menu Menu
/reservations Reservations

CTAs:
Book a table · Order online

MAIN: [up to 8 000 chars of body prose]

The AI uses this to ground every spoken answer in your real page content. Without it the model has no idea what page the visitor is on.

`read_viewport` — what’s currently on screen

Returns prose that intersects the visitor’s current viewport — what they’re actually looking at right now. Sections scrolled past or below the fold are not included.

When the AI calls it: visitor says “read this”, “read what’s on screen”, “read it out”.

Parameters: none.

Returns: up to 2 400 chars of viewport-visible prose, joined with newlines. If the visible region exceeds the cap, the response ends with " …" so the AI knows to offer to continue.

The seasonal pre-fixe menu changes weekly based on what our farms harvest. Each course is paired with a wine from our cellar — see the wine list on page 2. Dietary restrictions can be accommodated with 24h notice. …

read_viewport differs from read_page in scope:

	`read_page`	`read_viewport`
Scope	Whole document	Currently visible only
Char cap	8 000	2 400
Use case	”What’s this page about?"	"Read this.”
When called	Once per page navigation	On explicit visitor request

`read_section` — read one indexed section aloud

After search_knowledge_base returns a hit with {url, section_id}, the AI can ask to read that specific section in full. Triggers a persistent highlight + auto-scroll on the page while the agent reads it aloud.

When the AI calls it: after a knowledge-base search, when the visitor says “read me that section”, “read the part about X”.

Parameters:

{
  "url": "/services/water-heater-installation",
  "section_id": "warranty"
}

Both fields are returned verbatim from the prior search_knowledge_base call.

Returns: up to 1 500 chars of section prose. Ends with " …" if truncated; AI offers to continue reading.

Side effects (browser-side):

Fetches /v1/<siteId>/read-section to get the full section text
Finds the element via [data-spelo-section-id="..."] or [data-section-id="..."] or getElementById
Calls onSectionRead callback — VoiceWidget mounts a persistent highlight overlay + (if section is taller than 90% of viewport) slowly auto-scrolls during the read

The visitor sees the highlight appear right where the agent is reading, even if they scroll away — the highlight tracks the actual section position.

`see.snapshot` — structured element grid

The keystone of the v2 action protocol. Returns an array of every interactive or textual element on the page with stable ids, roles, names, bounding boxes, and visibility.

When the AI calls it: before any action targeting a specific element — “click the third button”, “fill the email field”, “scroll to the next section”.

Parameters: none.

Returns: JSON array. Each entry:

{
  "id": "sp-12",
  "role": "button",
  "name": "Book a table",
  "bbox": [620, 480, 140, 44],
  "visible": true
}

Field	Meaning
`id`	Stable element id — pass to `act.click`, `act.fill`, `act.scroll_to`. Persists across re-renders if the element keeps its `data-spelo-id`.
`role`	`button` · `link` · `checkbox` · `radio` · `textbox` · `select` · `heading` · `section` · `image` · `other`
`name`	Accessible name — `aria-label`, label[for], placeholder, or visible text
`bbox`	`[x, y, width, height]` in viewport pixels
`visible`	Whether the element has a non-zero box and is at least partially in the viewport
`value`	(form fields only) current value
`type`	(input fields only) `text` / `email` / `tel` / etc.

Capped at 150 elements per snapshot to keep the LLM prompt under budget. If your page is denser, scroll triggers a fresh snapshot.

Why snapshot-based addressing matters

Compared to the legacy click_element({ text: "Submit" }):

Icon buttons (heart, ×, ⋯ menu) — legacy can’t find them; snapshot does, because it reads accessible names
Duplicate labels — three “Add to cart” buttons on a category page — legacy picks the first; snapshot lets the AI pick the right one by bbox / surrounding context
Dynamic forms — after the AI fills a field that triggers more fields to appear, a fresh snapshot reflects the new state immediately
Shadow DOM components — snapshot crosses into shadow trees that text-matching can’t reach

For these reasons, the system prompt instructs the AI to prefer the v2 path (see.snapshot → act.*) and use legacy tools only as fallback.

Performance and limits

Tool	Char cap	Wall-clock cost
`read_page`	8 000	DOM walk ~20-50 ms
`read_viewport`	2 400	DOM walk ~10-20 ms
`read_section`	1 500	+ ~80-200 ms HTTP round-trip to read-section endpoint
`see.snapshot`	150 elements	DOM walk ~15-40 ms

All four run in the visitor’s browser. There’s no Spelo server roundtrip except for read_section.

Other tool categories

Overview Actions Knowledge & lifecycle

Page perception tools

read_page — full page scrape

read_viewport — what’s currently on screen

read_section — read one indexed section aloud

see.snapshot — structured element grid

Why snapshot-based addressing matters

Performance and limits

See also

Other tool categories

`read_page` — full page scrape

`read_viewport` — what’s currently on screen

`read_section` — read one indexed section aloud

`see.snapshot` — structured element grid