Windows Fast Automation MCP Skill
Integrates winapp-mcp to interact directly with the Windows UI Automation Tree for native desktop application control.
Skill: Windows Fast UI Automation (winapp-mcp)
When the user asks you to interact with a Native Windows Application (e.g. Excel, File Explorer, Word, Settings, Calculator, Fidelity ATP), you must bypass the slow screenshot/coordinate loop and use the Windows UI Automation Tree via winapp-mcp.
Location
The winapp-mcp package lives in:
global_workflows/winapp-mcp/server/WinAppMCP.exe
If it's not there, install globally via npm install -g winapp-mcp (see new_machine_setup workflow).
How it Works
WinAppMCP.exe is a .NET server that communicates via JSON-RPC over stdin/stdout (the Model Context Protocol). It exposes 54 tools that hook into the Windows UI Automation tree to instantly read and control native application UI elements.
Critical Limitation: This server CANNOT read inside opaque web browser canvases (Chrome/Edge page content) or hardware-accelerated 3D viewports. For web page content, use the Edge DOM Bridge Extension instead. For Java apps (like Interactive Brokers TWS), first run jabswitch -enable to activate the Java Access Bridge.
WinApp-MCP vs. COM Automation: WinApp-MCP navigates the UI — it clicks buttons, reads menus, and fills forms like a user would. For programmatic data operations on Office apps (reading/writing thousands of Excel cells, evaluating formulas, generating Word documents, sending Outlook emails), use COM automation via
pywin32instead — it's orders of magnitude faster because it talks directly to the application engine, bypassing the UI. See com_automation skill.
The Correct Attach Pattern
IMPORTANT: Do NOT launch apps via launch_app and assume the returned PID is correct. Many Win11 apps (especially UWP/WinUI3 apps like Notepad, Calculator, Settings) re-parent to an existing system process, making the launched PID stale.
The reliable workflow is:
- Launch the app yourself (via
Start-Processorsubprocess.Popen) - Wait 2-3 seconds for it to render
- Call
list_desktop_windows(no arguments) — returns every visible window with Title, PID, ProcessName - Find your target by matching the window title
- Call
attach_to_pidwith the PID from step 3 - Now call
get_snapshot,type_text,invoke_element, etc. — all require theappIdparameter returned by attach
Common App Process Names
| Application | Process Name | Exe Path | Notes |
|---|---|---|---|
| Notepad | notepad | C:\Windows\notepad.exe | Win11 UWP — must use real PID from list_desktop_windows |
| File Explorer | explorer | explorer.exe | Always running; attach to existing |
| Microsoft Edge | msedge | msedge.exe | Use for outer shell (tabs, URL bar); DOM Bridge for page content |
| Google Chrome | chrome | chrome.exe | Same as Edge — outer shell only |
| Excel | EXCEL | via Office path | Full native UIA support |
| Word | WINWORD | via Office path | Full native UIA support |
| Settings | SystemSettings | ms-settings: URI | Win11 UWP |
| Calculator | CalculatorApp | calc.exe | Win11 UWP — re-parents PID |
| Paint | mspaint | mspaint.exe | Classic Win32 |
| Fidelity ATP | AtpInvestor | Custom install path | Native .NET — full UIA support (except chart canvas) |
Key Tools (54 total)
| Tool | What It Does |
|---|---|
| list_desktop_windows | Lists all visible windows with PID + title (use this FIRST) |
| attach_to_pid / attach_to_app | Connects the server to a running app; returns appId |
| get_snapshot | Returns the full UI tree (buttons, text fields, menus) |
| invoke_element | Clicks a button or menu item programmatically |
| type_text | Types text into the focused/active text field |
| click_element | Clicks an element by name or AutomationId |
| get_grid_item | Reads a specific DataGrid cell by row/column |
| fill_form | Fills multiple form fields in one call |
| get_all_values | Reads all editable field values at once |
| find_elements | Search for elements by type, id, or name |
| press_key_combo | Sends keyboard shortcuts (Ctrl+S, Alt+F4) |
Python Bridge Template
import subprocess, json, sys, re
EXE = r"path\to\WinAppMCP.exe"
def read_msg(p):
while True:
line = p.stdout.readline()
if not line: return None
line = line.strip()
if not line: continue
try: return json.loads(line)
except json.JSONDecodeError: pass
def call(p, id, tool, args):
msg = {"jsonrpc":"2.0","id":id,"method":"tools/call","params":{"name":tool,"arguments":args}}
p.stdin.write(json.dumps(msg)+"\n"); p.stdin.flush()
return read_msg(p)
# Start server
proc = subprocess.Popen([EXE], stdin=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=subprocess.PIPE, text=True, encoding='utf-8')
# Initialize handshake
proc.stdin.write(json.dumps({"jsonrpc":"2.0","id":1,"method":"initialize",
"params":{"protocolVersion":"2024-11-05","capabilities":{},
"clientInfo":{"name":"Antigravity","version":"1.0"}}})+"\n")
proc.stdin.flush()
read_msg(proc)
proc.stdin.write(json.dumps({"jsonrpc":"2.0","method":"notifications/initialized"})+"\n")
proc.stdin.flush()
# Now use call(proc, id, "tool_name", {args}) for any tool
# IMPORTANT: Every tool call requires "appId" (except list_desktop_windows)
# Extract appId from attach response: re.search(r'(app_\d+)', text).group(1)
Verified Working (2026-04-24)
- ✅
list_desktop_windows— instantly lists all windows - ✅
attach_to_pid— attaches to running process - ✅
get_snapshot— returns full UI tree (Notepad: menus, tabs, text editor) - ✅
type_text— typed "HELLO FROM CLAUDE!" into Notepad successfully - ⚠️
launch_app— works but returned PID may be stale for UWP apps