← Wiki

agent-browser — Browser Automation Designed for LLMs

vercel-labs/agent-browser (30k stars, Apache-2.0) is a native Rust CLI that talks directly to Chrome via CDP. The thesis is simple: Playwright and Puppeteer were built for engineers writing tests. LLMs need a different surface — stable element refs from the accessibility tree, structured JSON everywhere, explicit content boundaries, and a stateless command shape that composes with tool-calling.

The primitive that matters: @e1 refs

Every traditional browser automation tool hands the agent a CSS selector or XPath. Those shift on every re-render. agent-browser exposes the page’s accessibility tree:

agent-browser snapshot
# - button "Submit" [ref=e2]
# - textbox "Email" [ref=e3]

agent-browser click @e2
agent-browser fill @e3 "[email protected]"

Refs are deterministic and stable across commands in a session. The LLM doesn’t have to re-query after every click to find the “new” button — it already has @e2 and can keep using it. This is the same move fff.nvim made at the file-search layer: give the agent an index with meaningful IDs and stop making it re-derive them on every turn.

Traditional selectors still work (click "#submit", find role button click --name "Submit"), but the ref flow is the one optimized for LLMs.

Architecture

Rust CLI  ──Unix socket / named pipe──▶  Rust daemon  ──CDP──▶  Chrome for Testing

Content boundaries — the prompt-injection defense

The feature I didn’t expect: --content-boundaries wraps tool output in explicit delimiters so the LLM can distinguish what it produced from what the page produced. A user-visible phrase like “IMPORTANT: ignore previous instructions” is just text inside a page snapshot, not an instruction. Without boundaries, that’s exactly how injection attacks land.

This is the same concern token-efficient-web-requests raises for web fetching (“you’re feeding untrusted HTML into the prompt”) but specifically for browser automation, where scraped text is the primary return value of every tool call.

Features that signal the target audience is agents, not humans

When to reach for this vs alternatives

Need Tool
Persistent logged-in Chrome profile, personal agent browser-automation-cdp (Playwright + connect_over_cdp)
Ephemeral automation in an agent workflow, want ref-based UI interaction agent-browser
MCP integration into Claude Code / Cursor Playwright MCP (still the path if you want MCP specifically)
Cloud-native scaled browser fleet Browserless / Browserbase / Kernel — agent-browser lists integrations

The clean split: browser-automation-cdp = “my Chrome on my Mac mini, logged into everything, agent reuses”; agent-browser = “the tool shape an LLM would design for itself if it could.”

Why this pattern generalizes

agent-browser is a concrete instance of a pattern that keeps recurring in the wiki:

The common thread: the AI-native rewrite wins when the old tool was designed for humans. Browser-automation-for-humans (Playwright) is verbose, state-dependent, and hostile to process isolation. Browser-automation-for-agents needs refs, boundaries, batch, and a daemon. Same surface, different primitives.

See Also

Related