A CLI MCP server for ad-hoc UI testing of Android & iOS apps — a tight act+observe loop, device-condition control, log/crash & network capture, accessibility audits, OCR targeting, and recording that turns an exploratory session into a replayable regression flow.
Manos drives Android emulators/devices (via adb) and iOS simulators
(via xcrun simctl, with idb for native UI interaction) over the
Model Context Protocol. It's purpose-built for the
exploratory, test-free loop where an LLM agent pokes at an app and reacts to what it sees: atomic
act+observe actions, control over the device conditions real users hit, crash/log and network
capture for when something breaks, an accessibility audit, and one-call recording that promotes an
ad-hoc session into a replayable regression test. See the
design notes & roadmap for the rationale.
Every tap/type/swipe optionally returns the new screen state or a diff — one round-trip instead of act → inspect → repeat.
Hierarchy nodes get content-based ids that survive counters/clocks, so screen diffs mean something.
Dark mode, locale, GPS, network, orientation, font scale, status bar — test the states real users hit.
logcat / unified log with automatic crash & ANR detection surfaced on every fetch.
Touch-target size, unlabeled controls, duplicate labels — graded against per-platform guidelines.
When the a11y tree is thin (styled buttons, canvas/Flutter, WebViews), find_text OCRs the screenshot to find & tap on-screen text.
Decrypted HTTP from debug builds, filtered to your endpoints — Frida/OkHttp on Android, mitmproxy on iOS.
Record an ad-hoc session and export it as a replayable Maestro flow + HTML report. Promote exploration into a regression test.
Manos is a stdio MCP server published to npm. Most AI clients can launch it on demand with npx — no install step — so the snippets below use that. Prefer a resident binary? npm install -g manos-mcp and use a bare manos serve instead.
# run on demand — nothing to install: npx -y manos-mcp serve # …or install the CLI globally: npm install -g manos-mcp
Requires Node 20+. Working on manos itself, or pinning a local build? Clone the repo, npm install (it builds via the prepare script), and point your client at node /ABS/PATH/to/Manos-MCP/dist/cli.js serve.
One command registers the server (add -s user to share it across all your projects):
claude mcp add manos -- npx -y manos-mcp serve # globally installed binary instead: claude mcp add manos -- manos serve
Verify with claude mcp list, then ask Claude to “list mobile devices.”
Edit claude_desktop_config.json (macOS: ~/Library/Application Support/Claude/, Windows: %APPDATA%\Claude\) and restart the app:
{
"mcpServers": {
"manos": {
"command": "npx",
"args": ["-y", "manos-mcp", "serve"]
}
}
}
Project-scoped .cursor/mcp.json (or global ~/.cursor/mcp.json):
{
"mcpServers": {
"manos": {
"command": "npx",
"args": ["-y", "manos-mcp", "serve"]
}
}
}
Then enable manos under Settings → MCP.
Agent mode reads .vscode/mcp.json (note the servers key and explicit type):
{
"servers": {
"manos": {
"type": "stdio",
"command": "npx",
"args": ["-y", "manos-mcp", "serve"]
}
}
}
Open the Copilot Chat Agent picker and toggle the manos tools on.
~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"manos": {
"command": "npx",
"args": ["-y", "manos-mcp", "serve"]
}
}
}
Then hit Refresh in the Windsurf MCP panel.
Any MCP-capable client speaks the same stdio protocol. Configure a server that runs:
command: npx args: ["-y", "manos-mcp", "serve"] transport: stdio
The most common config shape is a mcpServers object keyed by server name with command + args, as shown in the other tabs.
npx -y manos-mcp doctor # or: manos doctor
doctor reports which backends it found (adb, xcrun, idb, maestro, OCR engine), lists connected devices, and prints each device's live capabilities — so you know what works before you rely on it. Android needs adb (auto-detected from $ANDROID_HOME / the default SDK path); iOS needs Xcode's xcrun. Everything else is optional and only gates the advanced features below.
The everyday unit of work is act → observe: most sessions cycle through a handful of
tools, and each action can return the resulting screen so you rarely call inspect_screen twice in a row.
| Tool | What it does |
|---|---|
| list_devices | List local Android & iOS devices with id, platform, OS, state. Start here for a device_id. |
| inspect_screen | Compact JSON hierarchy with stable element ids; target by id / text / resource-id / accessibility. |
| tap / long_press | Tap by selector or coordinates; returns the resulting screen (act + observe). |
| input_text | Type into the focused field (optionally focus a field first). |
| press_key | Press a hardware/system key — back, enter, home, … |
| swipe | Swipe by direction or between two points (scroll lists, dismiss sheets). |
| wait_for | Poll until an element is visible / not-visible, or timeout — replaces fixed sleeps. |
| assert | Clean pass/fail that an element is visible / not visible. |
| find_elements / find_text | Search the current screen by query; find_text OCRs for text the a11y tree misses. |
| take_screenshot | PNG of the current screen when you need pixels. |
Sign in, then wait for the landing screen — five calls, each act returning what changed:
list_devices # → pick a device_id
inspect_screen { device_id } # compact tree, stable ids
tap { device_id, text: "Sign In", observe: "diff" }
input_text { device_id, text: "user@example.com" }
tap { device_id, text: "Continue", observe: "screen" }
wait_for { device_id, text: "Welcome", timeout_ms: 8000 }
id (from inspect_screen),
text, resource_id, or accessibility; fall back to
x/y only when nothing else matches. If a text selector
finds nothing in the tree, Manos automatically retries with OCR (or force it with
tap { text, ocr: true }).
observe modes. "diff" (added/removed/changed nodes — cheapest
for a step-by-step loop), "screen" (the full new tree), "screenshot"
(a PNG), or "none" (just do it). Stable ids are what make diff meaningful.
| Tool | Android | iOS |
|---|---|---|
| Inspection & interaction | ||
| inspect_screen |
FULL
uiautomator → warm maestro session → cold CLI
|
FULL
idb, or warm maestro session
|
| take_screenshot |
FULL
screencap
|
FULL
simctl io
|
| tap / long_press |
FULL
adb input
|
PARTIAL
idb (full); maestro needs launch_app first
|
| input_text |
FULL
adb input text
|
PARTIAL
idb (full); maestro fallback
|
| swipe |
FULL
adb input swipe
|
PARTIAL
idb (full); maestro fallback
|
| press_key |
FULL
rich keyevents
|
PARTIAL
limited keys (enter/delete/tab/home/lock/siri)
|
| find_text (OCR) |
FULL
Apple Vision / Tesseract
|
FULL
Apple Vision / Tesseract
|
| assert / wait_for / find_elements |
FULL
built on inspect
|
FULL
built on inspect
|
| App lifecycle & state | ||
| launch_app / stop_app |
FULL
am / monkey
|
FULL
simctl launch/terminate
|
| clear_app_state |
FULL
pm clear — true data wipe
|
PARTIAL
resets permissions; full wipe needs reinstall
|
| open_deeplink |
FULL
am start VIEW
|
FULL
simctl openurl
|
| set_permission |
FULL
pm grant/revoke
|
FULL
simctl privacy
|
| Device conditions | ||
| set_appearance |
FULL
cmd uimode night
|
FULL
simctl ui appearance
|
| set_orientation |
FULL
settings user_rotation
|
N/A
no CLI; rotate via Simulator menu (Cmd+←/→)
|
| set_locale |
PARTIAL
per-app, Android 13+ (API 33)
|
PARTIAL
applied on relaunch via launch args
|
| set_network |
FULL
svc wifi/data, airplane-mode
|
N/A
sim shares host net; use Network Link Conditioner
|
| set_location |
FULL
emu geo fix (emulator only)
|
FULL
simctl location set
|
| set_font_scale |
FULL
system font_scale
|
FULL
Dynamic Type content size
|
| set_status_bar |
PARTIAL
SystemUI demo mode (clock/battery/signal)
|
FULL
rich override (time/battery/carrier/signal)
|
| push_notification |
N/A
needs FCM server key + token
|
FULL
simctl push (inject APNs payload)
|
| set_conditions (presets) |
FULL
applies the above, gated per condition
|
FULL
applies the above, gated per condition
|
| Diagnostics | ||
| get_logs (+ crash/ANR) |
FULL
logcat
|
FULL
simctl spawn log
|
| a11y_audit |
FULL
heuristic over hierarchy
|
FULL
heuristic over hierarchy
|
| network capture (start/requests/stop) |
PARTIAL
Frida OkHttp hook (debuggable + frida-server)
|
PARTIAL
mitmproxy + simctl-trusted CA + macOS proxy (pinning blocks; NSURLSession hook planned)
|
| Authored flows | ||
| run_flow |
FULL
local flow execution
|
FULL
local flow execution
|
| cheat_sheet |
FULL
flow-syntax guidance
|
FULL
flow-syntax guidance
|
| Session recording | ||
| start/stop_recording, export_flow, export_report |
FULL
platform-agnostic
|
FULL
platform-agnostic
|
Capabilities are reported live per device by the device_capabilities tool and the doctor command, including the active backend — the agent never has to guess what a platform supports.
These go beyond tap/inspect. Each is capability-gated and degrades gracefully — Manos tells you what's missing instead of failing opaquely. Where a feature needs something special on the developer's machine or build, the setup is called out below.
Capture the real HTTP an app makes — above TLS, post-decryption — filtered to the endpoints you care about so the agent isn't flooded. Bodies are decompressed and readable.
network_start · network_requests · network_clear · network_stop
frida-server running on the device (matching the host Frida version).
pip install "frida==16.7.19" frida-tools (Frida 16.7.x keeps the global Java bridge the OkHttp hook relies on).
network_start { device_id, app_id, filter: "yourapi\\.com" } → interact → network_requests.
brew install mitmproxy. Manos installs a simctl-trusted CA and manages the macOS proxy automatically (saved & restored on stop — no sudo). Apps using NSURLSession certificate pinning are not yet captured (NSURLSession Frida hook is planned).
Full walkthrough and troubleshooting: NETWORK.md.
Styled <div> buttons, canvas/Flutter/game UIs, and poor-a11y WebViews expose little hierarchy. find_text OCRs the screenshot and returns pixel-accurate boxes; targeting falls back to OCR automatically, or force it with tap { text, ocr: true }.
find_text · tap / input_text { ocr: true }
brew install tesseract / apt install tesseract-ocr). doctor prints the active engine.
When adb's uiautomator dump can't reach UI-idle (apps with constant animation/watermarks), Manos keeps one long-lived warm hierarchy engine resident instead of cold-starting the JVM per call: steady-state inspect drops from ~9–14s to ~175ms. The choice is cached per device; the child is process-tree-killed on exit.
maestro under the hood, so it must be on PATH.Record an exploratory session and promote it to CI in one call. export_flow emits a valid Maestro flow (verified with maestro check-syntax) using resilient selectors; export_report writes a self-contained HTML timeline — a screenshot after every action plus an appendix bundling the flow YAML, captured network, and recent logs/crashes.
start_recording { report: true } · export_flow · export_report
report:true to start_recording to capture per-step screenshots for the HTML report.Apply many device conditions in one call, or a named preset, with per-condition applied / skipped (with reason) / failed reporting.
set_conditions { preset: "accessibility" | "offline" | "screenshot" | "dark" | "international" | "reset", ...overrides }
set_network is emulator-only on Android, push_notification is iOS-only.Execute a declarative flow locally — inline YAML, a file, or a directory of flows — against a connected device. Sessions captured with export_flow run here unchanged, so an exploratory session becomes a repeatable check.
run_flow · cheat_sheet
maestro on PATH — the same engine used for the warm session — which runs the flow against your local emulator/simulator.iOS UI interaction works out of the box via the Maestro fallback (call launch_app first), but idb makes tap/type/inspect noticeably faster and unlocks the full key set.
brew install idb-companion && pipx install fb-idb. Manos auto-detects it and reports the active backend per device.| Tool | Description |
|---|---|
| list_devices | List local Android & iOS devices with id, platform, OS, state. |
| device_capabilities | Report which actions a device supports (full/partial/unavailable) + backend. |
| inspect_screen | Compact JSON hierarchy with stable element ids; target by id/text/resource-id. |
| take_screenshot | PNG of the current screen. |
| tap / long_press | Tap by element selector or coordinates; returns the resulting screen (act+observe). |
| input_text | Type into the focused field; optionally focus a field first. |
| press_key | Press a hardware/system key (back, enter, home, …). |
| swipe | Swipe by direction or between two points. |
| assert | Assert an element is visible / not visible (no polling). |
| wait_for | Poll until an element is visible / not visible, or timeout — replaces fixed sleeps. |
| find_elements | Search the current screen for elements matching a query. |
| find_text | OCR the screenshot to locate on-screen text (pixel boxes) the a11y tree misses — styled buttons, canvas/Flutter/games, WebViews. Tap via tap{text, ocr:true}. |
| launch_app / stop_app | Launch (optionally clearing state) or terminate an app. |
| clear_app_state | Reset app state. |
| open_deeplink | Open a deep / universal link URL. |
| set_permission | Grant or revoke a runtime permission. |
| set_appearance | Light / dark mode. |
| set_orientation | Portrait / landscape / upside-down. |
| set_locale | App locale (BCP-47). |
| set_network | Toggle wifi / cellular / airplane mode. |
| set_location | Simulated GPS coordinates. |
| set_font_scale | Text size for accessibility testing. |
| set_status_bar | Override status bar (time/battery/signal) for clean screenshots. |
| set_conditions | Apply many device conditions at once / a named preset (offline, accessibility, screenshot, dark, international, reset). |
| push_notification | Deliver a push notification (iOS APNs payload). |
| get_logs | Recent device logs with automatic crash & ANR detection. |
| a11y_audit | Accessibility audit of the current screen. |
| network_start / network_requests / network_clear / network_stop | Capture decrypted HTTP from a debug app (Android: Frida + OkHttp hook; iOS: mitmproxy + trusted CA), filtered to specific endpoints. See NETWORK.md. |
| run_flow | Run an authored flow locally (inline YAML, files, or directory). |
| cheat_sheet | Flow-syntax guidance for writing / exporting flows. |
| start_recording / stop_recording | Record an ad-hoc session (set report:true for per-step screenshots). |
| export_flow | Export the recorded session as a replayable Maestro flow + report. |
| export_report | Self-contained HTML report: screenshot timeline + flow + logs + captured network. |
The roadmap, drawn from IMPROVEMENTS.md. Status reflects priority, not a commitment date.
| Feature | Status | What it adds |
|---|---|---|
| Loop & inspection | ||
| Idle / animation settling | PLANNED | Wait for the UI to stop animating before returning state, so you never act on a mid-transition screen. |
| Token-budgeted inspect | EXPLORING | max_depth / interactive-only filter for very dense screens. |
| Semantic targeting | EXPLORING | Rank elements by an embedding/LLM match to a natural-language description ("the checkout button"). |
| Assertions & state | ||
| Richer assertions + network-idle waits | PLANNED | text-equals, element-count, enabled/checked/selected, "wait for no spinners". |
| Process-death / restoration testing | PLANNED | Background, kill the process, relaunch, assert state restored — a classic Android bug class. |
| Time / clipboard / data seeding | PLANNED | Clock control for time-dependent UI; clipboard get/set; prefill defaults / content providers. |
| Diagnostics & quality | ||
| Color-contrast & text-size a11y | NEXT | Sample screenshot pixels at element bounds to compute WCAG contrast — the one a11y check that needs pixels. |
| Visual regression | NEXT | Capture/approve baseline screenshots and perceptual-diff against them (status-bar overrides already help stabilize). |
| Performance signals | PLANNED | Cold/warm start time, frame jank (gfxinfo), memory (meminfo) captured alongside a session. |
| Video capture | PLANNED | Record the session (screenrecord / simctl recordVideo) and attach to the report for bug tickets. |
| Network & coverage | ||
| Response stubbing | PLANNED | Rewrite/mizzle responses in-flight to drive error and edge-case states. |
| iOS NSURLSession hook + Cronet | PLANNED | Frida hook for pinned iOS apps and Android Cronet stacks, beyond OkHttp/mitmproxy. |
| Icon / template matching | PLANNED | Locate non-text controls (icons) by template, extending the OCR fallback. |
| Multi-device / matrix runs | EXPLORING | Drive the same session across OS versions / form factors in parallel and diff outcomes. |