← Back to Workflow
Skill

Visible Browser Skill (Model Context Protocol)

Restores the headed "visible browser" capability in Google Antigravity 2.0. Primary method uses direct Python/Playwright scripts for speed and reliability. MCP server tools available as backup.

Visible Browser Skill (Model Context Protocol)

This skill restores the headed "visible browser" capability that was deprecated and removed in the transition to Antigravity 2.0. It supports two control methods (direct Python scripts as primary, MCP server tools as backup) allowing the agent to launch Chrome, navigate pages, click elements, capture screenshots, and shutdown the process tree.

I. Architecture & Setup

  • Launch Trigger: The MCP server launches the browser by programmatically connecting to the IDE's fixed Electron debugging port 9000 via CDP, finding the Chrome button a.codicon-chrome in the workbench DOM (workbench.html), and clicking it. This starts headed Chrome on debug port 9222.
  • Steering & Control: Playwright connects over CDP to port 9222 to steer the browser pages.
  • Direct Python Method (Primary): For maximum speed and reliability, launch the browser by running open_browser_cdp9000.py directly via run_command, then write inline Playwright scripts to steer it. This avoids the MCP server's frequent EOF crashes and preserves persistent cookies correctly.
  • MCP Server Method (Backup): If the direct method fails, fall back to the call_mcp_tool wrapper targeting ServerName: "visible_browser" (see Section II).
  • Redundancy / Fallbacks: For deeper troubleshooting or alternative architectures, refer to the detailed development and fallback guide in the history archive: visible-browser_skill.md

II. Available Tools

All tools are called via the call_mcp_tool wrapper targeting ServerName: "visible_browser":

  1. launch_browser
    • Purpose: Programmatically clicks the IDE Chrome button to launch Chrome, waits for CDP port 9222 to start, and navigates to the starting URL.
    • Arguments: {"url": "<URL>"} (optional, defaults to about:blank).
  2. navigate
    • Purpose: Navigates the active page to a new URL.
    • Arguments: {"url": "<URL>"}.
  3. click_element
    • Purpose: Clicks a visible DOM element using Playwright selector logic.
    • Arguments: {"selector": "<SELECTOR>"} (e.g., text="I understand" or button.submit).
  4. capture_screenshot
    • Purpose: Captures a screenshot of the active page and saves it directly to the project directory.
    • Arguments: {"filename": "<FILENAME>"} (e.g., proof_screenshot.png).
  5. shutdown_browser
    • Purpose: Closes the Playwright connections and forcefully kills all running chrome.exe process trees.
    • Arguments: {}.

III. Standard Usage Workflow

Whenever you need to perform headed web automation or visual validation, follow this sequence.

Step 1: Launch the Browser

Primary method (direct script): Run the launch script directly via run_command:

python open_browser_cdp9000.py

Then write a small inline Python script to navigate, and execute it via run_command:

import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.connect_over_cdp("http://127.0.0.1:9222")
        context = browser.contexts[0] if browser.contexts else await browser.new_context()
        page = context.pages[-1]
        await page.goto("https://www.wikipedia.org")
        print(await page.title())

asyncio.run(main())

Backup method (MCP server): If the direct method fails, use the MCP tool:

{
  "ServerName": "visible_browser",
  "ToolName": "launch_browser",
  "Arguments": {
    "url": "https://www.wikipedia.org"
  }
}

Step 2: Handle Cookie Consent & Pop-ups

Identify any obstructing overlays. Always inspect the elements or text content specifically, avoiding generic clicks:

{
  "ServerName": "visible_browser",
  "ToolName": "click_element",
  "Arguments": {
    "selector": "button:has-text('Accept All')"
  }
}

Step 3: Visual Verification (Screenshot Proofs)

Per the strict anti-cheating policy, always capture a screenshot after major actions or transitions to confirm success:

{
  "ServerName": "visible_browser",
  "ToolName": "capture_screenshot",
  "Arguments": {
    "filename": "wikipedia_homepage.png"
  }
}

Step 3.5: Full-Page ("Long") Screenshots

If you need to capture a "long picture of the entire website" (a full-page top-to-bottom screenshot) because the content extends beyond the fold, you MUST bypass the basic MCP tool and instead write a short Python script using Playwright to connect to the browser and capture it with full_page=True.

import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        # Connect to the visible browser CDP port
        browser = await p.chromium.connect_over_cdp("http://127.0.0.1:9222")
        context = browser.contexts[0] if browser.contexts else await browser.new_context()
        page = context.pages[-1] # Target the active tab
        
        # Capture the entire document, top to bottom
        await page.screenshot(path="full_website_capture.png", full_page=True)

asyncio.run(main())

(Note: If using raw Chrome DevTools Protocol commands instead of Playwright, use Page.captureScreenshot with the "captureBeyondViewport": true parameter).

Step 4: Page Navigation

Change URLs on the active browser:

{
  "ServerName": "visible_browser",
  "ToolName": "navigate",
  "Arguments": {
    "url": "https://en.wikipedia.org/wiki/Main_Page"
  }
}

Step 5: Clean Shutdown

Once all tasks are completed or if you encounter a critical automation error, always shutdown the browser to free ports and resources:

{
  "ServerName": "visible_browser",
  "ToolName": "shutdown_browser",
  "Arguments": {}
}

IV. Environmental Troubleshooting

  • WinNat Port Conflicts: If the browser launches but the tool times out attempting to connect to CDP port 9222, check for Windows NAT port exclusions. Run net stop winnat in an elevated shell to release port reservations.
  • Zombie Locks: If the browser fails to start and you need to clear locks, NEVER run taskkill /IM chrome.exe as this will kill the user's personal browser. Instead, selectively kill only the automated instance using PowerShell:
    Get-CimInstance Win32_Process -Filter "Name = 'chrome.exe'" | Where-Object CommandLine -match "remote-debugging-port=9222" | Invoke-CimMethod -MethodName Terminate
    
  • Crash Screen Freeze ("Restore Pages"): If the browser launches but fails to navigate because it's frozen on a Chrome crash popup, Playwright cannot attach to it. To bypass this, manually inject a new blank page via the CDP endpoint to give Playwright a valid target: python -c "import urllib.request; urllib.request.urlopen(urllib.request.Request('http://127.0.0.1:9222/json/new', method='PUT'))". (Note: The MCP server's launch_browser tool now does this automatically).

V. Behavioral Guardrails

  • Do not prematurely declare the browser broken: NEVER tell the user that the visible browser is broken or instruct them to manually perform actions (like copy-pasting URLs) just because a single command failed. You must first make a genuine, exhaustive attempt to debug the issue (e.g., verifying you actually called launch_browser, checking for Zombie Locks, and actually trying to reopen the browser). Do not take lazy shortcuts (like running Start-Process) to bypass this tool without actually trying to fix it properly.

This is used in: