Manos

A CLI MCP server for ad-hoc UI testing of Android & iOS apps — a tight act+observe loop, device-condition control, log/crash & network capture, accessibility audits, OCR targeting, and recording that turns an exploratory session into a replayable regression flow.

What it is

Manos drives Android emulators/devices (via adb) and iOS simulators (via xcrun simctl, with idb for native UI interaction) over the Model Context Protocol. It's purpose-built for the exploratory, test-free loop where an LLM agent pokes at an app and reacts to what it sees: atomic act+observe actions, control over the device conditions real users hit, crash/log and network capture for when something breaks, an accessibility audit, and one-call recording that promotes an ad-hoc session into a replayable regression test. See the design notes & roadmap for the rationale.

Act + observe

Every tap/type/swipe optionally returns the new screen state or a diff — one round-trip instead of act → inspect → repeat.

Stable element ids

Hierarchy nodes get content-based ids that survive counters/clocks, so screen diffs mean something.

Device conditions

Dark mode, locale, GPS, network, orientation, font scale, status bar — test the states real users hit.

Crash & log capture

logcat / unified log with automatic crash & ANR detection surfaced on every fetch.

Accessibility audit

Touch-target size, unlabeled controls, duplicate labels — graded against per-platform guidelines.

OCR fallback

When the a11y tree is thin (styled buttons, canvas/Flutter, WebViews), find_text OCRs the screenshot to find & tap on-screen text.

Network capture

Decrypted HTTP from debug builds, filtered to your endpoints — Frida/OkHttp on Android, mitmproxy on iOS.

Session → flow

Record an ad-hoc session and export it as a replayable Maestro flow + HTML report. Promote exploration into a regression test.

Installation

Manos is a stdio MCP server published to npm. Most AI clients can launch it on demand with npx — no install step — so the snippets below use that. Prefer a resident binary? npm install -g manos-mcp and use a bare manos serve instead.

1 · Install (optional)

# run on demand — nothing to install:
npx -y manos-mcp serve

# …or install the CLI globally:
npm install -g manos-mcp

Requires Node 20+. Working on manos itself, or pinning a local build? Clone the repo, npm install (it builds via the prepare script), and point your client at node /ABS/PATH/to/Manos-MCP/dist/cli.js serve.

2 · Register with your AI client

One command registers the server (add -s user to share it across all your projects):

claude mcp add manos -- npx -y manos-mcp serve

# globally installed binary instead:
claude mcp add manos -- manos serve

Verify with claude mcp list, then ask Claude to “list mobile devices.”

Edit claude_desktop_config.json (macOS: ~/Library/Application Support/Claude/, Windows: %APPDATA%\Claude\) and restart the app:

{
  "mcpServers": {
    "manos": {
      "command": "npx",
      "args": ["-y", "manos-mcp", "serve"]
    }
  }
}

Project-scoped .cursor/mcp.json (or global ~/.cursor/mcp.json):

{
  "mcpServers": {
    "manos": {
      "command": "npx",
      "args": ["-y", "manos-mcp", "serve"]
    }
  }
}

Then enable manos under Settings → MCP.

Agent mode reads .vscode/mcp.json (note the servers key and explicit type):

{
  "servers": {
    "manos": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "manos-mcp", "serve"]
    }
  }
}

Open the Copilot Chat Agent picker and toggle the manos tools on.

~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "manos": {
      "command": "npx",
      "args": ["-y", "manos-mcp", "serve"]
    }
  }
}

Then hit Refresh in the Windsurf MCP panel.

Any MCP-capable client speaks the same stdio protocol. Configure a server that runs:

command: npx
args:    ["-y", "manos-mcp", "serve"]
transport: stdio

The most common config shape is a mcpServers object keyed by server name with command + args, as shown in the other tabs.

3 · Verify the toolchain

npx -y manos-mcp doctor        # or: manos doctor

doctor reports which backends it found (adb, xcrun, idb, maestro, OCR engine), lists connected devices, and prints each device's live capabilities — so you know what works before you rely on it. Android needs adb (auto-detected from $ANDROID_HOME / the default SDK path); iOS needs Xcode's xcrun. Everything else is optional and only gates the advanced features below.

Basic operations

The everyday unit of work is act → observe: most sessions cycle through a handful of tools, and each action can return the resulting screen so you rarely call inspect_screen twice in a row.

Tool What it does
list_devices List local Android & iOS devices with id, platform, OS, state. Start here for a device_id.
inspect_screen Compact JSON hierarchy with stable element ids; target by id / text / resource-id / accessibility.
tap / long_press Tap by selector or coordinates; returns the resulting screen (act + observe).
input_text Type into the focused field (optionally focus a field first).
press_key Press a hardware/system key — back, enter, home, …
swipe Swipe by direction or between two points (scroll lists, dismiss sheets).
wait_for Poll until an element is visible / not-visible, or timeout — replaces fixed sleeps.
assert Clean pass/fail that an element is visible / not visible.
find_elements / find_text Search the current screen by query; find_text OCRs for text the a11y tree misses.
take_screenshot PNG of the current screen when you need pixels.

A typical loop

Sign in, then wait for the landing screen — five calls, each act returning what changed:

list_devices                                              # → pick a device_id
inspect_screen   { device_id }                            # compact tree, stable ids
tap              { device_id, text: "Sign In", observe: "diff" }
input_text       { device_id, text: "user@example.com" }
tap              { device_id, text: "Continue", observe: "screen" }
wait_for         { device_id, text: "Welcome", timeout_ms: 8000 }
Targeting, in order of resilience. Prefer a selector over coordinates — Manos records selectors for resilient replay. Pass id (from inspect_screen), text, resource_id, or accessibility; fall back to x/y only when nothing else matches. If a text selector finds nothing in the tree, Manos automatically retries with OCR (or force it with tap { text, ocr: true }).
The observe modes. "diff" (added/removed/changed nodes — cheapest for a step-by-step loop), "screen" (the full new tree), "screenshot" (a PNG), or "none" (just do it). Stable ids are what make diff meaningful.

Android vs iOS capability matrix

FULL  native, reliable PARTIAL  works with caveats / extra setup N/A  not feasible on this platform
Tool Android iOS
Inspection & interaction
inspect_screen FULL
uiautomator → warm maestro session → cold CLI
FULL
idb, or warm maestro session
take_screenshot FULL
screencap
FULL
simctl io
tap / long_press FULL
adb input
PARTIAL
idb (full); maestro needs launch_app first
input_text FULL
adb input text
PARTIAL
idb (full); maestro fallback
swipe FULL
adb input swipe
PARTIAL
idb (full); maestro fallback
press_key FULL
rich keyevents
PARTIAL
limited keys (enter/delete/tab/home/lock/siri)
find_text (OCR) FULL
Apple Vision / Tesseract
FULL
Apple Vision / Tesseract
assert / wait_for / find_elements FULL
built on inspect
FULL
built on inspect
App lifecycle & state
launch_app / stop_app FULL
am / monkey
FULL
simctl launch/terminate
clear_app_state FULL
pm clear — true data wipe
PARTIAL
resets permissions; full wipe needs reinstall
open_deeplink FULL
am start VIEW
FULL
simctl openurl
set_permission FULL
pm grant/revoke
FULL
simctl privacy
Device conditions
set_appearance FULL
cmd uimode night
FULL
simctl ui appearance
set_orientation FULL
settings user_rotation
N/A
no CLI; rotate via Simulator menu (Cmd+←/→)
set_locale PARTIAL
per-app, Android 13+ (API 33)
PARTIAL
applied on relaunch via launch args
set_network FULL
svc wifi/data, airplane-mode
N/A
sim shares host net; use Network Link Conditioner
set_location FULL
emu geo fix (emulator only)
FULL
simctl location set
set_font_scale FULL
system font_scale
FULL
Dynamic Type content size
set_status_bar PARTIAL
SystemUI demo mode (clock/battery/signal)
FULL
rich override (time/battery/carrier/signal)
push_notification N/A
needs FCM server key + token
FULL
simctl push (inject APNs payload)
set_conditions (presets) FULL
applies the above, gated per condition
FULL
applies the above, gated per condition
Diagnostics
get_logs (+ crash/ANR) FULL
logcat
FULL
simctl spawn log
a11y_audit FULL
heuristic over hierarchy
FULL
heuristic over hierarchy
network capture (start/requests/stop) PARTIAL
Frida OkHttp hook (debuggable + frida-server)
PARTIAL
mitmproxy + simctl-trusted CA + macOS proxy (pinning blocks; NSURLSession hook planned)
Authored flows
run_flow FULL
local flow execution
FULL
local flow execution
cheat_sheet FULL
flow-syntax guidance
FULL
flow-syntax guidance
Session recording
start/stop_recording, export_flow, export_report FULL
platform-agnostic
FULL
platform-agnostic

Capabilities are reported live per device by the device_capabilities tool and the doctor command, including the active backend — the agent never has to guess what a platform supports.

Advanced operations & setup

These go beyond tap/inspect. Each is capability-gated and degrades gracefully — Manos tells you what's missing instead of failing opaquely. Where a feature needs something special on the developer's machine or build, the setup is called out below.

🔌 Network capture (decrypted HTTP)

Capture the real HTTP an app makes — above TLS, post-decryption — filtered to the endpoints you care about so the agent isn't flooded. Bodies are decompressed and readable.

network_start · network_requests · network_clear · network_stop

Android setup (Frida / OkHttp hook). Works through HTTP/2, certificate pinning, and proxy-bypass because it hooks OkHttp in-process, not the socket.
• A debuggable app build (or any app on an emulator / rooted device).
frida-server running on the device (matching the host Frida version).
• Host tooling: pip install "frida==16.7.19" frida-tools (Frida 16.7.x keeps the global Java bridge the OkHttp hook relies on).
Then: network_start { device_id, app_id, filter: "yourapi\\.com" } → interact → network_requests.
iOS Simulator setup (mitmproxy). brew install mitmproxy. Manos installs a simctl-trusted CA and manages the macOS proxy automatically (saved & restored on stop — no sudo). Apps using NSURLSession certificate pinning are not yet captured (NSURLSession Frida hook is planned).

Full walkthrough and troubleshooting: NETWORK.md.

🔎 OCR fallback for off-tree elements

Styled <div> buttons, canvas/Flutter/game UIs, and poor-a11y WebViews expose little hierarchy. find_text OCRs the screenshot and returns pixel-accurate boxes; targeting falls back to OCR automatically, or force it with tap { text, ocr: true }.

find_text · tap / input_text { ocr: true }

Setup. On macOS, OCR uses Apple Vision via a tiny Swift helper compiled & cached on first use (needs Xcode Command Line Tools — no extra install). Elsewhere it falls back to Tesseract (brew install tesseract / apt install tesseract-ocr). doctor prints the active engine.

⚡ Warm-session speed

When adb's uiautomator dump can't reach UI-idle (apps with constant animation/watermarks), Manos keeps one long-lived warm hierarchy engine resident instead of cold-starting the JVM per call: steady-state inspect drops from ~9–14s to ~175ms. The choice is cached per device; the child is process-tree-killed on exit.

Setup. None — automatic. The warm engine and the cross-platform interaction fallback are provided by maestro under the hood, so it must be on PATH.

📼 Session recording → regression test

Record an exploratory session and promote it to CI in one call. export_flow emits a valid Maestro flow (verified with maestro check-syntax) using resilient selectors; export_report writes a self-contained HTML timeline — a screenshot after every action plus an appendix bundling the flow YAML, captured network, and recent logs/crashes.

start_recording { report: true } · export_flow · export_report

Setup. None. Pass report:true to start_recording to capture per-step screenshots for the HTML report.

🎚️ Condition presets

Apply many device conditions in one call, or a named preset, with per-condition applied / skipped (with reason) / failed reporting.

set_conditions { preset: "accessibility" | "offline" | "screenshot" | "dark" | "international" | "reset", ...overrides }

Setup. None. Individual conditions are capability-gated (see the matrix) — e.g. set_network is emulator-only on Android, push_notification is iOS-only.

▶️ Run authored flows

Execute a declarative flow locally — inline YAML, a file, or a directory of flows — against a connected device. Sessions captured with export_flow run here unchanged, so an exploratory session becomes a repeatable check.

run_flow · cheat_sheet

Setup. Needs maestro on PATH — the same engine used for the warm session — which runs the flow against your local emulator/simulator.

📱 Fast native iOS control (idb)

iOS UI interaction works out of the box via the Maestro fallback (call launch_app first), but idb makes tap/type/inspect noticeably faster and unlocks the full key set.

Setup. brew install idb-companion && pipx install fb-idb. Manos auto-detects it and reports the active backend per device.

Tool reference

Tool Description
list_devices List local Android & iOS devices with id, platform, OS, state.
device_capabilities Report which actions a device supports (full/partial/unavailable) + backend.
inspect_screen Compact JSON hierarchy with stable element ids; target by id/text/resource-id.
take_screenshot PNG of the current screen.
tap / long_press Tap by element selector or coordinates; returns the resulting screen (act+observe).
input_text Type into the focused field; optionally focus a field first.
press_key Press a hardware/system key (back, enter, home, …).
swipe Swipe by direction or between two points.
assert Assert an element is visible / not visible (no polling).
wait_for Poll until an element is visible / not visible, or timeout — replaces fixed sleeps.
find_elements Search the current screen for elements matching a query.
find_text OCR the screenshot to locate on-screen text (pixel boxes) the a11y tree misses — styled buttons, canvas/Flutter/games, WebViews. Tap via tap{text, ocr:true}.
launch_app / stop_app Launch (optionally clearing state) or terminate an app.
clear_app_state Reset app state.
open_deeplink Open a deep / universal link URL.
set_permission Grant or revoke a runtime permission.
set_appearance Light / dark mode.
set_orientation Portrait / landscape / upside-down.
set_locale App locale (BCP-47).
set_network Toggle wifi / cellular / airplane mode.
set_location Simulated GPS coordinates.
set_font_scale Text size for accessibility testing.
set_status_bar Override status bar (time/battery/signal) for clean screenshots.
set_conditions Apply many device conditions at once / a named preset (offline, accessibility, screenshot, dark, international, reset).
push_notification Deliver a push notification (iOS APNs payload).
get_logs Recent device logs with automatic crash & ANR detection.
a11y_audit Accessibility audit of the current screen.
network_start / network_requests / network_clear / network_stop Capture decrypted HTTP from a debug app (Android: Frida + OkHttp hook; iOS: mitmproxy + trusted CA), filtered to specific endpoints. See NETWORK.md.
run_flow Run an authored flow locally (inline YAML, files, or directory).
cheat_sheet Flow-syntax guidance for writing / exporting flows.
start_recording / stop_recording Record an ad-hoc session (set report:true for per-step screenshots).
export_flow Export the recorded session as a replayable Maestro flow + report.
export_report Self-contained HTML report: screenshot timeline + flow + logs + captured network.

Upcoming features

The roadmap, drawn from IMPROVEMENTS.md. Status reflects priority, not a commitment date.

NEXT  highest priority PLANNED  on the roadmap EXPLORING  ideas / scale
Feature Status What it adds
Loop & inspection
Idle / animation settling PLANNED Wait for the UI to stop animating before returning state, so you never act on a mid-transition screen.
Token-budgeted inspect EXPLORING max_depth / interactive-only filter for very dense screens.
Semantic targeting EXPLORING Rank elements by an embedding/LLM match to a natural-language description ("the checkout button").
Assertions & state
Richer assertions + network-idle waits PLANNED text-equals, element-count, enabled/checked/selected, "wait for no spinners".
Process-death / restoration testing PLANNED Background, kill the process, relaunch, assert state restored — a classic Android bug class.
Time / clipboard / data seeding PLANNED Clock control for time-dependent UI; clipboard get/set; prefill defaults / content providers.
Diagnostics & quality
Color-contrast & text-size a11y NEXT Sample screenshot pixels at element bounds to compute WCAG contrast — the one a11y check that needs pixels.
Visual regression NEXT Capture/approve baseline screenshots and perceptual-diff against them (status-bar overrides already help stabilize).
Performance signals PLANNED Cold/warm start time, frame jank (gfxinfo), memory (meminfo) captured alongside a session.
Video capture PLANNED Record the session (screenrecord / simctl recordVideo) and attach to the report for bug tickets.
Network & coverage
Response stubbing PLANNED Rewrite/mizzle responses in-flight to drive error and edge-case states.
iOS NSURLSession hook + Cronet PLANNED Frida hook for pinned iOS apps and Android Cronet stacks, beyond OkHttp/mitmproxy.
Icon / template matching PLANNED Locate non-text controls (icons) by template, extending the OCR fallback.
Multi-device / matrix runs EXPLORING Drive the same session across OS versions / form factors in parallel and diff outcomes.