🇨🇳 中文

Browser Automation in Claude Code: 5 Tools Compared (2026)

Browser-use vs Agent Browser vs Playwright CLI vs Playwright MCP vs DevTools MCP — real-world token usage differs by 10x+. Speed, cost, and stability compared with setup instructions.

Bruce

Claude CodeBrowser AutomationMCPPlaywright CLIPlaywright MCPAgent BrowserBrowser-use

AI Guides

3594  Words

2026-01-28


Writing code with AI is old news. The real game-changer is having AI control a browser — opening pages, clicking buttons, filling forms, and scraping data — all from a single natural-language prompt.

The Claude Code ecosystem now offers five mainstream browser automation options: Browser-use (an AI-agent-native automation framework), Vercel’s Agent Browser, Microsoft’s Playwright CLI (new in 2026), Microsoft’s Playwright MCP, and Google’s DevTools MCP. Each excels in different scenarios, and picking the wrong one can cost you time and tokens.

This guide provides a deep comparison of all five so you can make the right choice for your workflow.

March 2026 update: Added Browser-use — an AI-agent-first browser automation framework with local, cloud, and real-browser modes plus persistent sessions and cloud parallelism.

February 2026 update: Added Playwright CLI — Microsoft’s officially recommended token-efficient approach, with 4-100x lower token usage than MCP in real-world tests.

Why Browser Automation Matters

The Pain Without It

Imagine asking your AI assistant to:

  • Screenshot a competitor’s new feature page
  • Log into an internal dashboard and export a report
  • Test whether a form you just built submits correctly
  • Inspect an API response on a live page to debug an issue

Without browser automation, you have to manually open the browser, take screenshots or copy content, and paste it back to the AI. That workflow breaks down entirely for dynamic content behind logins or rendered by JavaScript.

What Browser Automation Enables

With browser automation, a single instruction does it all:

You say: "Open Amazon, search for 'mechanical keyboard', and list the top 5 prices."

The AI:
1. Launches a browser
2. Navigates to amazon.com
3. Types "mechanical keyboard" in the search box
4. Clicks search
5. Reads the top 5 product prices
6. Returns a formatted table

One sentence from you. Fully automated execution.

Quick Comparison of All Five Tools

Before diving deep, here is the overview:

DimensionBrowser-useAgent BrowserPlaywright CLIPlaywright MCPDevTools MCP
DeveloperBrowser-use teamVercel LabsMicrosoftMicrosoftGoogle
PurposeAI-agent-native automationLightweight AI agent toolToken-efficient agent automationGeneral browser automationChrome debugging protocol wrapper
IntegrationBash CLI / SkillBash CLI / SkillShell command / SkillMCP ServerMCP Server + extension
Token usageVery lowVery low (93% reduction)Very low (75-99% reduction)HighMedium
Browser supportChromium / real Chrome / cloudChromiumChrome/Firefox/WebKitChrome/Firefox/WebKitChrome only
Key strengthMulti-mode + persistent sessions + cloud parallelismFast, low tokensLow tokens + cross-browserStable, full-featuredDeep debugging

One-line summaries:

  • Browser-use: The all-in-one solution — local, cloud, and real-browser modes with session persistence
  • Agent Browser: Lightweight and fast — best for everyday browsing
  • Playwright CLI: Token-efficient with professional capabilities — the new default for coding agents
  • Playwright MCP: Most feature-complete — the stable choice for non-CLI environments
  • DevTools MCP: The debugging specialist — best for inspecting and troubleshooting

Deep Dive: What Makes Each Tool Unique

Browser-use: The All-in-One Powerhouse

Browser-use is a browser automation framework built specifically for AI agents. Its core philosophy: let AI agents operate browsers like humans do, but 100x more efficiently.

Unlike the other tools, Browser-use is not just a utility — it is a complete AI agent browser automation platform supporting three distinct modes: isolated local browsers, real Chrome profiles, and cloud-hosted remote browsers.

Core Mechanism: State + Index

Browser-use takes a minimalist approach to page representation. The state command returns a list of interactive elements, each assigned a numeric index. Operations reference these indexes directly:

# Get page state
browser-use state

# Example output:
# [0] link "Home"
# [1] link "Products"
# [2] input "Search..."
# [3] button "Sign In"

# Operate by index
browser-use click 3          # Click "Sign In"
browser-use input 2 "iPhone"  # Type in the search box

Similar to Agent Browser’s ref system, but Browser-use goes further — sessions persist across commands, so you never need to relaunch the browser.

Three Browser Modes

This is what sets Browser-use apart:

ModeFlagCharacteristicsBest for
chromium-b chromiumFast, isolated, headless by defaultAutomated testing, data scraping
real-b realUses your actual Chrome with profilesSites requiring login state, extensions, cookies
remote-b remoteCloud-hosted browser with built-in proxyAnti-bot bypass, parallel tasks, no local dependencies
# Isolated mode: fast and clean
browser-use -b chromium open https://example.com

# Real browser: uses your Chrome profile
browser-use -b real --profile "Default" open https://example.com

# Cloud browser: no local resources needed, built-in proxy
browser-use -b remote open https://example.com

Real browser mode means the AI can use your existing login sessions, installed extensions, and saved passwords — no extra authentication setup required.

Cloud Parallelism: Sub-Agent Capability

Browser-use’s cloud mode supports async tasks and parallel agents — a capability no other tool offers:

# Launch a cloud task (async execution)
browser-use -b remote run "Open example.com and extract all product prices"

# Launch multiple tasks simultaneously
browser-use -b remote run "Check competitor A pricing" --session task-a
browser-use -b remote run "Check competitor B pricing" --session task-b
browser-use -b remote run "Check competitor C pricing" --session task-c

# View all task statuses
browser-use task list

# Get a specific task's result
browser-use task status --id <task-id>

Imagine having the AI open 10 competitor websites in parallel, scraping pricing data from all of them in seconds. Serial approaches simply cannot match this.

Advanced Features

Beyond basic browser operations, Browser-use offers several standout capabilities:

FeatureDescription
Python executionBuilt-in Python session with cross-command state; direct access to the browser object
Profile syncSync cookies/profiles between local and cloud
TunnelExpose localhost:3000 to cloud browsers (browser-use tunnel 3000)
Data extractionget text / get html / eval for direct page data retrieval
Smart waitswait selector / wait text to wait for specific elements or text
Session managementNamed sessions for parallel multi-browser operation

Diagnostics

Not sure if your setup is correct? Browser-use includes a dedicated diagnostic command:

browser-use doctor
# Auto-checks: browser installation, dependency versions, network connectivity, cloud API availability

Best Use Cases

ScenarioExample
Automation requiring login state“Open our internal dashboard using my Chrome profile”
Parallel data collection“Scrape pricing from 10 competitor sites simultaneously”
Anti-bot scenarios“Use a cloud browser with proxy to access this site”
Local dev tunneling“Tunnel localhost:3000 to a cloud browser for testing”
Complex Python scripts“Process page data in batch using Python”

Installation

# Install (requires Python)
pip install browser-use

# Diagnose your environment
browser-use doctor

# Start using
browser-use open https://example.com --headed

In Claude Code, Browser-use integrates as a Skill, supporting natural-language commands:

"Use browser-use to open example.com, get the page state, and click the sign-in button"

Agent Browser: The Lightweight Speed Demon

Agent Browser is Vercel’s purpose-built browser automation tool for AI agents. Its design philosophy: give the AI the minimum information needed to understand a webpage.

Core Mechanism: Snapshot + Refs

Traditional approaches send the entire DOM tree or accessibility tree to the AI, often consuming tens of thousands of tokens. Agent Browser takes a different approach — it sends a compact “snapshot” with short reference IDs (refs) for each interactive element.

# Agent Browser snapshot format
- button "Sign In" [ref=e1]
- input "Username" [ref=e2]
- input "Password" [ref=e3]
- link "Forgot Password" [ref=e4]

The AI sees this clean structure. To click “Sign In,” it simply says “click e1” — no CSS selectors or XPath needed.

Token Usage Comparison

OperationTraditional approachAgent Browser
Open a moderately complex page~15,000 tokens~1,000 tokens
Fill out a form~8,000 tokens~500 tokens
Execute a 10-step workflow~100,000 tokens~7,000 tokens

A 93% reduction in token usage means:

  • Faster responses (less data for the AI to process)
  • Lower costs (if billed by token)
  • Less risk of hitting context window limits

Best Use Cases

ScenarioExample
Browse a webpage“Open the competitor’s homepage and check it out”
Screenshot comparison“Take a screenshot to see the updated design”
Fill forms“Enter the test data into the form”
Information gathering“Check the pricing on this page”
Simple interactions“Click that button”

Installation

# Global install (recommended for best performance)
npm install -g agent-browser

# Install Chromium (required on first setup)
agent-browser install

# Start using
agent-browser open https://example.com

# Or try without installing globally (slower)
npx agent-browser open https://example.com

In Claude Code, Agent Browser typically integrates as a Skill:

"Use Agent Browser to open https://example.com and take a screenshot"

Playwright CLI: The Token-Efficient Specialist (New in 2026)

Playwright CLI is Microsoft’s next-generation browser automation approach, launched in early 2026. If Playwright MCP is the “heavy infantry,” the CLI is a “special forces unit” built specifically for coding agents like Claude Code, Cursor, and Copilot — same firepower, dramatically lower supply costs.

Microsoft explicitly recommends this approach in the Playwright MCP repository:

“Modern coding agents increasingly favor CLI-based workflows exposed as SKILLs over MCP because CLI invocations are more token-efficient.”

Core Mechanism: Data on Disk, Not in Context

The fundamental difference between Playwright CLI and MCP is where data lives:

Playwright MCP approach:
  Page snapshot → returned in full to the AI → consumes many tokens
  Screenshot → encoded as data in response → consumes even more tokens
  Console logs → attached every time → ongoing token cost

Playwright CLI approach:
  Page snapshot → saved as YAML file → AI reads only when needed
  Screenshot → saved as PNG file → AI views only when needed
  Console logs → written to log file → retrieved on demand

Think of it this way: MCP is a verbose assistant who dumps every detail into every report. CLI is an efficient assistant who says “the report is on your desk” and lets you read it when you need to.

Token Usage: Real-World Benchmarks

ScenarioPlaywright MCPPlaywright CLISavings
Single page snapshot~15,000 tokens~200 tokens (file path)98.7%
10-step automation~114,000 tokens~27,000 tokens76.3%
Test flow with screenshots~150,000 tokens~5,000 tokens96.7%
Long sessions (50+ steps)Context overflow riskRuns stablyQualitative leap

Benchmark data from TestCollab and SupaTest independent reviews.

Why such a dramatic difference? MCP stuffs the full accessibility tree and console messages into context at every step. CLI returns only a file path and a short confirmation. Tokens saved, context window preserved.

Workflow Example

# 1. Open a page
playwright-cli open https://example.com --headed

# 2. Take a page snapshot (saved as YAML, not stuffed into context)
playwright-cli snapshot
# Output: Snapshot saved to .playwright/snapshots/page-001.yaml
# Each element has a ref ID (e.g., e8, e21, e35)

# 3. Operate elements by ref ID (extremely concise)
playwright-cli fill e8 "[email protected]"
playwright-cli fill e12 "password123"
playwright-cli click e15

# 4. Take a screenshot (saved as file, not converted to tokens)
playwright-cli screenshot
# Output: Screenshot saved to .playwright/screenshots/page-001.png

# 5. Save login state (reusable next time)
playwright-cli state-save login-state.json

Notice how every command response is just a short file path — not thousands of tokens of DOM tree. That is the secret to CLI’s efficiency.

50+ Commands, Full Coverage

Playwright CLI is not a stripped-down MCP. It has complete automation capabilities:

CategoryCommandsPurpose
Navigationopen, goto, go-back, reloadPage navigation
Interactionclick, fill, type, drag, hoverElement operations
SnapshotssnapshotGet compact page structure
Screenshotsscreenshot, pdfVisual verification and export
Statestate-save, state-load, cookieLogin state management
Debuggingconsole, network, tracing, videoDev debugging
SessionsNamed sessionsParallel multi-browser operation

Best Use Cases

ScenarioExample
Long automation tasks“Run screenshot comparisons across 50 pages”
In-code test flows“Test the login → checkout → payment flow end to end”
Token budget constraints“Complete the browser task with minimal token usage”
Extending Playwright tests“Add AI-driven tests on top of the existing test suite”

Installation

# Install
npm install -g @playwright/cli@latest

# Initialize (auto-installs browsers)
playwright-cli install

# Start using
playwright-cli open https://example.com --headed

In Claude Code, Playwright CLI integrates as a Skill rather than an MCP Server — the approach Microsoft recommends.

CLI vs MCP: Which One?

Microsoft’s guidance is straightforward:

ConditionChoice
Using Claude Code / Cursor / Copilot or similar coding agentsCLI (preferred)
Agent has filesystem and shell accessCLI
Long-running automation tasksCLI
Sandboxed environment (no shell access)MCP
Need MCP protocol standard for generic agent workflowsMCP

Bottom line: If you are using Claude Code, CLI should be your default choice in most scenarios.

Playwright MCP: The Battle-Tested Workhorse

Playwright is Microsoft’s established browser automation framework, used by countless companies worldwide for E2E testing. Playwright MCP is its AI extension, purpose-built for tools like Claude Code.

Core Mechanism: Accessibility Tree

Playwright sends the full accessibility tree of a webpage to the AI. This tree contains detailed information about every element: role, name, state, hierarchy, and more.

# Playwright accessibility tree excerpt
- document
  - navigation
    - link "Home"
    - link "Products"
    - link "About Us"
  - main
    - heading "Welcome" [level=1]
    - form
      - textbox "Username" [required]
      - textbox "Password" [required] [type=password]
      - button "Sign In"

More comprehensive information, but higher token consumption.

Unique Strengths: Cross-Browser + Professional Testing

Playwright supports three browser engines:

  • Chromium (Chrome, Edge)
  • Firefox
  • WebKit (Safari)

This means you can test your site across different browsers using the same set of commands.

Additional professional testing features include:

  • Auto-wait: Only interacts with elements once they are ready — no race conditions
  • Network interception: Mock API responses on the fly
  • Multi-tab management: Control multiple pages simultaneously
  • Video recording: Automatically record the entire operation sequence

Best Use Cases

ScenarioExample
Feature testing“Test the login flow”
User journey validation“Run through the checkout process”
Regression testing“Verify the fix did not break other features”
Multi-step automation“Sign up → log in → post → log out”
Long-running stable execution“This script needs to run for a while”

Installation

// claude_desktop_config.json or settings.json
{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@anthropic-ai/mcp-server-playwright"]
    }
  }
}

DevTools MCP: The Debugging Expert

DevTools MCP is Google’s official offering, wrapping the Chrome DevTools Protocol (CDP) directly. If you have ever used Chrome’s developer tools (F12), you know how powerful they are.

Core Mechanism: Chrome DevTools Protocol

CDP is Chrome’s internal protocol, providing access to nearly every aspect of the browser’s internals:

  • Console output
  • Network requests and responses
  • DOM structure and styles
  • JavaScript execution environment
  • Performance metrics
  • And more

DevTools MCP exposes all of this to the AI, turning it into your advanced debugging assistant.

Unique Strength: Unmatched Debugging

While other tools focus on operating the browser, DevTools MCP focuses on understanding what is happening inside it.

You say: "The page is blank. Help me figure out why."

DevTools MCP will:
1. Check the console for errors
2. Inspect network requests for failures
3. Analyze JavaScript execution for exceptions
4. Check whether key elements rendered correctly
5. Provide a diagnostic conclusion

No other tool can do this.

Best Use Cases

ScenarioExample
Console error inspection“The page is blank — investigate”
Network request debugging“What did the API return?”
Performance analysis“The page loads too slowly”
CSS/DOM inspection“Why does the layout look wrong?”
Variable inspection“Show me the value of this variable”

Installation

DevTools MCP requires a Chrome extension:

  1. Install the MCP Server:
{
  "mcpServers": {
    "devtools": {
      "command": "npx",
      "args": ["@anthropic-ai/mcp-server-chrome-devtools"]
    }
  }
}
  1. Install the companion extension from the Chrome Web Store

  2. Launch Chrome with remote debugging enabled:

# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

# Windows
chrome.exe --remote-debugging-port=9222

Practical Selection Guide

Scenario 1: Quick Page Inspection

Recommended: Agent Browser

You just want the AI to open a page, take a screenshot, or read some content — nothing complex. Agent Browser is the fastest and most token-efficient option.

"Open competitor.com and show me their pricing page"
"Take a screenshot of the homepage"
"Check if this form renders correctly"

Scenario 2: Authenticated Automation or Parallel Scraping

Recommended: Browser-use

Need to use a real Chrome profile (with existing logins and extensions), or run tasks across multiple sites simultaneously? Browser-use is the only option with three browser modes and cloud parallelism.

"Use my Chrome profile to open the internal dashboard and export the monthly report"
"Open 10 competitor sites in parallel and scrape their pricing"
"Use a cloud browser with proxy to access this geo-restricted site"

Scenario 3: Complex User Flow Testing

Recommended: Playwright CLI (in Claude Code) / Playwright MCP (in sandboxed environments)

Sign up, log in, place an order, pay, log out — multi-step flows need stable, reliable execution. The 2026 recommendation is Playwright CLI — it has the same Playwright engine under the hood but uses 4x fewer tokens, preventing context overflow during long flows.

"Test the user registration flow: fill form → verify email → complete profile → redirect to dashboard"
"Run the full checkout flow and save screenshots of each step"

If your agent lacks shell access (e.g., a browser-based AI assistant), stick with Playwright MCP.

Scenario 4: Debugging Page Issues

Recommended: DevTools MCP

Blank pages, API errors, broken layouts — these require deep access to the browser’s internals. DevTools MCP is the only tool with direct Console, Network, and DOM access.

"The page keeps showing a spinner. Find out which API call is hanging."
"This button does nothing when clicked. Check for JavaScript errors."

Scenario 5: Long-Running, High-Volume Browser Operations

Recommended: Playwright CLI

If your task involves 50+ browser operations (batch testing, large-scale scraping), Playwright MCP’s context will gradually bloat until it overflows. CLI’s “data on disk” architecture is naturally suited for long-running tasks.

"Open these 100 URLs one by one, take a snapshot, inspect elements, and save screenshots to the results directory"

Scenario 6: Multiple Capabilities at Once

You can combine them.

These tools are not mutually exclusive. Configure all of them and let the AI pick the best tool for each task.

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    },
    "devtools": {
      "command": "npx",
      "args": ["@anthropic-ai/mcp-server-chrome-devtools"]
    }
  }
}

Add Agent Browser, Playwright CLI, and Browser-use as Skills, and you have a complete browser automation toolkit. Recommended combo: Agent Browser for everyday browsing, Browser-use for authenticated/parallel tasks, Playwright CLI for testing, DevTools MCP for debugging.

Pro Tips

1. Persist Login State

Many sites require authentication. Instead of logging in every time, have the AI save cookies:

"Open site.com with Agent Browser, let me log in, then save the login state"

Next time, the AI loads the saved cookies automatically — no re-authentication needed.

2. Headless Mode

If you do not need to see the browser window (e.g., running on a server), use headless mode:

"Run the login test flow with Playwright in headless mode"

3. Screenshot Comparison

When developing frontend features, compare before and after:

"Save a screenshot as before.png"
# Make code changes
"Take another screenshot as after.png and compare the differences"

4. Batch Operations

When you need the same operation on multiple pages:

"Open these 10 URLs one by one and save screenshots to the screenshots folder"

FAQ

Q1: Why can’t my Playwright MCP connect?

Check these common issues:

  1. Ensure Node.js 18+ is installed
  2. Verify the MCP Server configuration is correct
  3. Try running npx @anthropic-ai/mcp-server-playwright manually to see errors

Q2: DevTools MCP says “Cannot connect to Chrome”?

Make sure:

  1. Chrome is running with remote debugging enabled on port 9222
  2. No other process is using that port
  3. The Chrome extension is installed and active

Q3: Agent Browser screenshots are blank?

The page likely has not finished loading. Try:

"Wait 3 seconds after opening the page, then take a screenshot"

Q4: Which tool is the most stable?

For stability, Playwright MCP is the best choice. It has robust wait mechanisms and error handling — a production-grade automation framework.

Q5: Is the token difference really that large?

Yes. In real-world tests running the same 10-step operation:

  • Playwright MCP: ~114,000 tokens
  • DevTools MCP: ~50,000 tokens
  • Playwright CLI: ~27,000 tokens
  • Agent Browser: ~7,000 tokens

The gap is substantial. Playwright CLI uses roughly 4x fewer tokens than its MCP counterpart — a qualitative difference in long-running tasks where MCP may overflow the context window while CLI runs to completion without issues.

Q6: Can I install Playwright CLI and MCP at the same time?

Yes. CLI operates via shell commands while MCP runs as an MCP Server — they do not conflict. You can even have the AI use CLI for simple operations (saving tokens) and switch to MCP when full accessibility tree analysis is needed.

Summary

If you need…Choose
Quick browsing, screenshots, simple interactionsAgent Browser
Authenticated sessions / parallel scraping / anti-bot bypassBrowser-use
Testing and automation in Claude CodePlaywright CLI (2026 top pick)
Browser automation in sandboxed environmentsPlaywright MCP
Debugging, performance analysis, network inspectionDevTools MCP
All of the aboveConfigure all five — the AI picks the right tool

Quick reference:

  • Browse and fill forms → Agent Browser
  • Login state, parallelism, anti-botBrowser-use
  • Test and automate (with shell access) → Playwright CLI
  • Test and automate (sandboxed) → Playwright MCP
  • Debug and inspect → DevTools MCP

2026 recommendation: If you only install one, choose Browser-use — it covers three browser modes, persistent sessions, and cloud parallelism, making it the most versatile option for AI agent browser automation. If you focus on testing workflows, go with Playwright CLI. For the most token-efficient everyday browsing, add Agent Browser.

Now go let your AI assistant truly take the wheel.


References:

Comments

Join the discussion — requires a GitHub account