How I actually use it

AI can search the web.
But it can't log into your tools.

Your work lives behind logins. AI can't get there. Browser automation fixes that.

The problem

Screenshot. Analyze. Click. Screenshot. Analyze. Click.

Current AI browser tools take a picture of your screen at every step. A 30-second task becomes 3 minutes.

Capture

~1,500 tokens

Analyze

Every pixel

Decide

Pattern match

Repeat

+1,500 tokens

The approach

Skip the screenshots.
Talk directly to Chrome.

CDP (Chrome DevTools Protocol) — the same tech that powers Chrome's developer tools. Read text, click elements, extract data. No images.

"Give me the text"
not
"Here's a photo — what does it say?"

Compare

Three ways AI touches the web

AI Web Search

+Great for public information
+Built into ChatGPT, Gemini, Perplexity
−Can't access anything behind a login
−Can't interact with pages
−Can't fill forms or click buttons

SpeedFast

Tokens / page~500

Screenshot-Based

+Can see any page
+Works behind logins
+Handles complex UIs
−Slow — processes images each step
−Expensive — high token usage
−Fragile — visual changes break it

SpeedSlow

Tokens / page~1,500

Direct Browser Control

+Reads actual page data
+Works behind logins
+Fast — text is cheap to process
+Precise — clicks exact elements
−Needs a small setup script

SpeedFast

Tokens / page~200

Why it matters

Don't read the whole page.
Read only what you need.

Tokens = processing time = cost. Ask for exactly the data you need.

Screenshot

~1,500 tokens — every pixel, layout, decorations

Full page text

~800 tokens — all visible text on the page

Article only

~300 tokens — just the main content, no nav/footer

Just links

~150 tokens — only clickable elements

Accessibility tree

~100 tokens — semantic structure, roles, interactive elements

Just forms

~80 tokens — only input fields and labels

Same page. Six levels of precision. The accessibility tree gives you 90% of actionable info in 5% of the tokens.

Extraction

axtreeAccessibility tree (most compact)

readableMain article content only

contentAll visible text on the page

linksAll clickable links and buttons

formsAll input fields with labels

Interaction

clickClick by selector or text content

typeType into an input field

scrollScroll the page or element

waitWait for element to appear

navigateGo to a URL

Speed

blockSkip images, CSS, fonts (60-80% faster)

cookies_saveSave login sessions to file

cookies_loadRestore sessions (skip re-login)

unblockClear resource blocks

How it works

One script between AI and Chrome

1,200 lines of Python. No framework.

You ask

'Check my Search Console for this week's clicks'

→

AI plans

Decides which pages to visit and what data to extract

→

Script executes

A small Python script sends commands to Chrome via CDP

→

Chrome acts

Navigates, clicks, reads — using your existing login session

→

Text returns

Clean data comes back. AI processes text, not pictures.

What the script actually does

# 1. Launch Chrome + block images/CSS (60-80% faster loads)
cdp.py ensure
cdp.py block

# 2. Restore yesterday's login session
cdp.py cookies_load ~/.cookies/google.json

# 3. Navigate (smart wait — no fixed sleep)
cdp.py navigate "https://search.google.com/search-console"

# 4. What's on this page? (~100 tokens, not 1,500)
cdp.py axtree
→ [navigation] [button "7d"] [button "28d"] [heading "Performance"]

# 5. Click the 28-day range
cdp.py click "28d"

# 6. Read the data
cdp.py readable
→ "Performance: 4,820 clicks | 72,100 impressions | 6.7% CTR"

# 7. Save session for next time
cdp.py cookies_save ~/.cookies/google.json

If it's slower than your hands, you won't use it.

3 minutes for what takes you 30 seconds? You'll just do it yourself. Every time. The bottleneck is speed, not intelligence.

27x

fewer tokens via accessibility tree

60-80%

faster loads with resource blocking

re-logins with session persistence

AI can search the web.But it can't log into your tools.