How I actually use it

AI can search the web.
But it can't log into your tools.

Your work lives behind logins. AI can't get there. Browser automation fixes that.

The problem

Screenshot. Analyze. Click. Screenshot. Analyze. Click.

Current AI browser tools take a picture of your screen at every step. A 30-second task becomes 3 minutes.

01

Capture

~1,500 tokens

02

Analyze

Every pixel

03

Decide

Pattern match

04

Repeat

+1,500 tokens

The approach

Skip the screenshots.
Talk directly to Chrome.

CDP (Chrome DevTools Protocol) — the same tech that powers Chrome's developer tools. Read text, click elements, extract data. No images.

"Give me the text"
not
"Here's a photo — what does it say?"

Compare

Three ways AI touches the web

AI Web Search

  • +Great for public information
  • +Built into ChatGPT, Gemini, Perplexity
  • Can't access anything behind a login
  • Can't interact with pages
  • Can't fill forms or click buttons
SpeedFast
Tokens / page~500

Screenshot-Based

  • +Can see any page
  • +Works behind logins
  • +Handles complex UIs
  • Slow — processes images each step
  • Expensive — high token usage
  • Fragile — visual changes break it
SpeedSlow
Tokens / page~1,500

Direct Browser Control

  • +Reads actual page data
  • +Works behind logins
  • +Fast — text is cheap to process
  • +Precise — clicks exact elements
  • Needs a small setup script
SpeedFast
Tokens / page~200

Why it matters

Don't read the whole page.
Read only what you need.

Tokens = processing time = cost. Ask for exactly the data you need.

Screenshot
~1,500 tokens — every pixel, layout, decorations
Full page text
~800 tokens — all visible text on the page
Article only
~300 tokens — just the main content, no nav/footer
Just links
~150 tokens — only clickable elements
Accessibility tree
~100 tokens — semantic structure, roles, interactive elements
Just forms
~80 tokens — only input fields and labels

Same page. Six levels of precision. The accessibility tree gives you 90% of actionable info in 5% of the tokens.

Extraction

axtreeAccessibility tree (most compact)
readableMain article content only
contentAll visible text on the page
linksAll clickable links and buttons
formsAll input fields with labels

Interaction

clickClick by selector or text content
typeType into an input field
scrollScroll the page or element
waitWait for element to appear
navigateGo to a URL

Speed

blockSkip images, CSS, fonts (60-80% faster)
cookies_saveSave login sessions to file
cookies_loadRestore sessions (skip re-login)
unblockClear resource blocks

How it works

One script between AI and Chrome

1,200 lines of Python. No framework.

1

You ask

'Check my Search Console for this week's clicks'

2

AI plans

Decides which pages to visit and what data to extract

3

Script executes

A small Python script sends commands to Chrome via CDP

4

Chrome acts

Navigates, clicks, reads — using your existing login session

5

Text returns

Clean data comes back. AI processes text, not pictures.

What the script actually does

# 1. Launch Chrome + block images/CSS (60-80% faster loads)
cdp.py ensure
cdp.py block

# 2. Restore yesterday's login session
cdp.py cookies_load ~/.cookies/google.json

# 3. Navigate (smart wait — no fixed sleep)
cdp.py navigate "https://search.google.com/search-console"

# 4. What's on this page? (~100 tokens, not 1,500)
cdp.py axtree
→ [navigation] [button "7d"] [button "28d"] [heading "Performance"]

# 5. Click the 28-day range
cdp.py click "28d"

# 6. Read the data
cdp.py readable
→ "Performance: 4,820 clicks | 72,100 impressions | 6.7% CTR"

# 7. Save session for next time
cdp.py cookies_save ~/.cookies/google.json

If it's slower than your hands, you won't use it.

3 minutes for what takes you 30 seconds? You'll just do it yourself. Every time. The bottleneck is speed, not intelligence.

27x

fewer tokens via accessibility tree

60-80%

faster loads with resource blocking

0

re-logins with session persistence