← Back to Workflow
Skill

Windows Fast Automation MCP Skill

Integrates winapp-mcp to interact directly with the Windows UI Automation Tree for native desktop application control.

Skill: Windows Fast UI Automation (winapp-mcp)

When the user asks you to interact with a Native Windows Application (e.g. Excel, File Explorer, Word, Settings, Calculator, Fidelity ATP), you must bypass the slow screenshot/coordinate loop and use the Windows UI Automation Tree via winapp-mcp.

Location

The winapp-mcp package lives in: global_workflows/winapp-mcp/server/WinAppMCP.exe

If it's not there, install globally via npm install -g winapp-mcp (see new_machine_setup workflow).

How it Works

WinAppMCP.exe is a .NET server that communicates via JSON-RPC over stdin/stdout (the Model Context Protocol). It exposes 54 tools that hook into the Windows UI Automation tree to instantly read and control native application UI elements.

Critical Limitation: This server CANNOT read inside opaque web browser canvases (Chrome/Edge page content) or hardware-accelerated 3D viewports. For web page content, use the Edge DOM Bridge Extension instead. For Java apps (like Interactive Brokers TWS), first run jabswitch -enable to activate the Java Access Bridge.

WinApp-MCP vs. COM Automation: WinApp-MCP navigates the UI — it clicks buttons, reads menus, and fills forms like a user would. For programmatic data operations on Office apps (reading/writing thousands of Excel cells, evaluating formulas, generating Word documents, sending Outlook emails), use COM automation via pywin32 instead — it's orders of magnitude faster because it talks directly to the application engine, bypassing the UI. See com_automation skill.

The Correct Attach Pattern

IMPORTANT: Do NOT launch apps via launch_app and assume the returned PID is correct. Many Win11 apps (especially UWP/WinUI3 apps like Notepad, Calculator, Settings) re-parent to an existing system process, making the launched PID stale.

The reliable workflow is:

  1. Launch the app yourself (via Start-Process or subprocess.Popen)
  2. Wait 2-3 seconds for it to render
  3. Call list_desktop_windows (no arguments) — returns every visible window with Title, PID, ProcessName
  4. Find your target by matching the window title
  5. Call attach_to_pid with the PID from step 3
  6. Now call get_snapshot, type_text, invoke_element, etc. — all require the appId parameter returned by attach

Common App Process Names

| Application | Process Name | Exe Path | Notes | |---|---|---|---| | Notepad | notepad | C:\Windows\notepad.exe | Win11 UWP — must use real PID from list_desktop_windows | | File Explorer | explorer | explorer.exe | Always running; attach to existing | | Microsoft Edge | msedge | msedge.exe | Use for outer shell (tabs, URL bar); DOM Bridge for page content | | Google Chrome | chrome | chrome.exe | Same as Edge — outer shell only | | Excel | EXCEL | via Office path | Full native UIA support | | Word | WINWORD | via Office path | Full native UIA support | | Settings | SystemSettings | ms-settings: URI | Win11 UWP | | Calculator | CalculatorApp | calc.exe | Win11 UWP — re-parents PID | | Paint | mspaint | mspaint.exe | Classic Win32 | | Fidelity ATP | AtpInvestor | Custom install path | Native .NET — full UIA support (except chart canvas) |

Key Tools (54 total)

| Tool | What It Does | |---|---| | list_desktop_windows | Lists all visible windows with PID + title (use this FIRST) | | attach_to_pid / attach_to_app | Connects the server to a running app; returns appId | | get_snapshot | Returns the full UI tree (buttons, text fields, menus) | | invoke_element | Clicks a button or menu item programmatically | | type_text | Types text into the focused/active text field | | click_element | Clicks an element by name or AutomationId | | get_grid_item | Reads a specific DataGrid cell by row/column | | fill_form | Fills multiple form fields in one call | | get_all_values | Reads all editable field values at once | | find_elements | Search for elements by type, id, or name | | press_key_combo | Sends keyboard shortcuts (Ctrl+S, Alt+F4) |

Python Bridge Template

import subprocess, json, sys, re

EXE = r"path\to\WinAppMCP.exe"

def read_msg(p):
    while True:
        line = p.stdout.readline()
        if not line: return None
        line = line.strip()
        if not line: continue
        try: return json.loads(line)
        except json.JSONDecodeError: pass

def call(p, id, tool, args):
    msg = {"jsonrpc":"2.0","id":id,"method":"tools/call","params":{"name":tool,"arguments":args}}
    p.stdin.write(json.dumps(msg)+"\n"); p.stdin.flush()
    return read_msg(p)

# Start server
proc = subprocess.Popen([EXE], stdin=subprocess.PIPE, stdout=subprocess.PIPE,
                        stderr=subprocess.PIPE, text=True, encoding='utf-8')

# Initialize handshake
proc.stdin.write(json.dumps({"jsonrpc":"2.0","id":1,"method":"initialize",
    "params":{"protocolVersion":"2024-11-05","capabilities":{},
    "clientInfo":{"name":"Antigravity","version":"1.0"}}})+"\n")
proc.stdin.flush()
read_msg(proc)
proc.stdin.write(json.dumps({"jsonrpc":"2.0","method":"notifications/initialized"})+"\n")
proc.stdin.flush()

# Now use call(proc, id, "tool_name", {args}) for any tool
# IMPORTANT: Every tool call requires "appId" (except list_desktop_windows)
# Extract appId from attach response: re.search(r'(app_\d+)', text).group(1)

Verified Working (2026-04-24)

  • list_desktop_windows — instantly lists all windows
  • attach_to_pid — attaches to running process
  • get_snapshot — returns full UI tree (Notepad: menus, tabs, text editor)
  • type_text — typed "HELLO FROM CLAUDE!" into Notepad successfully
  • ⚠️ launch_app — works but returned PID may be stale for UWP apps

This is used in: