← Back to Workflow
Skill

Gemini (gemini.google.com) Browsing Skill

Guides headed browser automation on gemini.google.com, focusing on Deep Think, Deep Research, and screenshot verification.

Gemini (gemini.google.com) Browsing Skill

[!CAUTION] LIBERALLY CAPTURE SCREENSHOTS. Always capture screenshots after every click, input, or transition to verify that the browser is in the correct state. Do not rely solely on DOM checks. Visual confirmation ensures transparency and prevents backend errors from going unnoticed.

This skill defines the strict mechanical rules and UI interactions for automating gemini.google.com (specifically Deep Think and Deep Research) using the headed visible_browser MCP server.

Core Rules

[!IMPORTANT] REUSE EXISTING CHROME INSTANCE. Before attempting to launch a new browser process, always check if port 9222 is already open and listening. If it is active, reuse the existing Chrome window rather than clicking the IDE launch button again. This prevents Chrome from repeatedly closing and reopening.

[!CAUTION] MANDATORY PRE-FLIGHT STATE CHECK. Before submitting any prompt, take a screenshot and assess the current state. Determine: (1) Is there already a prompt submitted? (2) Is generation in progress? (3) Is there already a completed response? (4) Is the chat state clean? If a response to your prompt is already visible, DO NOT resubmit. Switch to extraction mode.

[!CAUTION] ONE PROMPT = ONE SUBMISSION. ALWAYS. A prompt is submitted as a single message via clipboard paste (Ctrl+V) or page.fill + Send. There is NO prompt length limit. NEVER break a prompt into multiple messages. NEVER submit additional text while Gemini is generating, as this will cancel the in-progress run.

Specific UI Mechanics & Selectors

1. Model & Feature Selectors

  • Mode Picker Button: button.input-area-switch or button[aria-label*="mode picker"] / button[aria-label*="Model picker"]
  • Upload & Tools Button: button[aria-label="Upload & tools"]
  • Deep Research Activation: gem-menu-item or button containing text "Deep research" or button[aria-label="Deep research"]
  • Deep Research Active Badge (Verification): button[aria-label="Deselect Deep research"]
  • Deep Think Path:
    1. Click Mode Picker: button.input-area-switch
    2. Click Thinking Level Submenu: gem-menu-item containing text "Thinking level"
    3. Click Deep Think option: gem-menu-item containing text "Deep Think" (Verify picker text changes to Pro\nDeep Think or similar)
  • Textbox Editor: div[role="textbox"] (with class .ql-editor.textarea)
  • Send Message Button: button[aria-label="Send message"]
  • Start Research Button: button:has-text("Start research")

2. Deep Research Submission Flow

  1. Ensure model is set to 3.1 Pro first (the Tools menu is disabled when Deep Think is active). If needed, click the Mode Picker and select 3.1 Pro.
  2. Click Upload & tools (button[aria-label="Upload & tools"]).
  3. Click Deep research (button:has-text("Deep research") or the Deep Research option).
  4. Verify activation: Ensure the Deselect Deep research badge is visible.
  5. Focus the Textbox Editor, enter the prompt, and click Send message (button[aria-label="Send message"]).
  6. Wait for the research plan to generate. Monitor progress by checking for the Start research button.
  7. Click Start research (button:has-text("Start research")) to kick off the background research.

3. Deep Think Activation Flow

  1. Click Mode Picker: button.input-area-switch.
  2. Click gem-menu-item containing text "Thinking level".
  3. Click gem-menu-item containing text "Deep Think".
  4. Verify picker button text indicates Deep Think is active (e.g. Pro\nDeep Think).

4. Resubmitting / Redoing on Error

If Gemini encounters an error (e.g., "I encountered an error doing what you asked. Could you try again?"):

  1. Locate the circular Redo button (button[aria-label="Redo"]) and click it.
  2. Wait a split second for the dropdown menu to open.
  3. Click the dropdown menu option with the exact text "Try again" (e.g. gem-menu-item or [role="menuitem"] with text "Try again"). This will resubmit the query and increment the generation counter (e.g., < 2/2 >).

5. Extraction

  • For Deep Research Reports: Click the main report body (container #extended-response-message-content), press Ctrl+A then Ctrl+C to copy, and save the result.
  • For Multi-Turn/Chat Responses: Scroll to the bottom of the message, click the Copy response icon (overlapping pages) under the newest response.

Troubleshooting & Stagnant Generations

If Deep Research or Deep Think appears to be stuck or hangs in "Analyzing results..." for a prolonged period (e.g., more than 3-5 minutes):

  • Refresh/Reload the page: Run page.reload(wait_until="domcontentloaded") and sleep for 5-10 seconds to allow the UI to re-render. In some cases, the backend has completed the generation, but the websocket connection has desynced. Refreshing forces the page to reload the chat history and display the completed report/response.

This is used in: