watchr — capabilities & industries

Drive real apps, everywhere

One tool surface, four backends — watchr auto-detects what's connected and routes the command.

iOS Simulator

Launch, tap, type, swipe and observe simulators via simctl + idb.

Physical iPhone (USB)

Drive a real iPhone over USB through the on-device WatchrRunner.

Android emulators & devices

adb-backed control; target a specific device by serial.

Web via Playwright

A real headed Chromium — navigate, click, type, assert.

One unified surface

Same commands on every backend; device="" auto-detects.

Device management

Boot, list, set, verify and set up simulators and devices.

Hardware buttons

Home, back, app-switch and more via press_button.

Driven by your agent, in plain English

watchr is an MCP server — your coding agent does the tapping, waiting and observing.

Natural-language control

"Test the checkout on Android." The agent drives; you describe.

70+ agent-native tools

perform, observe, run_steps, tap, type, swipe, wait — a full surface.

Accessibility-aware taps

Taps by element label, not brittle x/y coordinates.

Batch & combo tools

A whole flow in one round trip; observe = screenshot + UI tree.

Deep inspection

Read the UI tree, page HTML, or run arbitrary JS in the page.

Any MCP client

Claude Code, Codex, Cursor, Pi, OpenCode, Cline — anything MCP.

A full QA team, built in

From a one-line prompt to a structured, repeatable run — with audits on every screen.

AI test suites

Describe a flow → watchr builds a suite, runs every case, reports.

Visual regression

Save baselines, compare screens, catch UI drift.

Accessibility audit

Missing labels, small touch targets, contrast issues.

Performance audit

Slow screens flagged; timing on every action.

Security scan

Surface common web security issues on each screen.

SEO audit

On-page SEO checks for web flows, built in.

Usability audit

Frustration tracking — rage taps, dead taps, dead ends.

Crash & error detection

App state, crash logs, console errors, ANRs after any action.

Chaos / monkey testing

Random taps & swipes to surface crashes under stress.

Evidence on every run

Proof you can share — auto-captured, no setup.

Screen recording

Per-run .mp4 / .webm video, including whole run_steps runs.

Screenshots to disk

Full-resolution captures saved for bug reports and diffs.

Console & network logs

Captured per run and queryable, with response bodies.

Structured reports

Markdown + JSON with pass/fail, screenshots and evidence.

Traces & HAR

Playwright trace zips and HAR captures for deep web debugging.

File issues

report_issue writes a structured bug report from a run.

Personas, scale & parallel

Many users, many devices, at once — locally.

Personas

Define users with their own timeouts and behavior; switch with use_persona.

Locales & conditions

Run the same app as different users, regions and slow-network profiles.

Parallel runs

N isolated web sessions × N flows in a single round trip (parallel_run).

Multi-device

Target devices by serial; run across several phones at once.

Persistent profiles

Headed-Chromium profile so web logins survive across runs.

Sessions & actions

Track named actions and sessions across a run.

Autonomous & safe by design

It explores on its own — within guardrails enterprise review asks for.

Autonomous explore

Turn it loose on a web app and let it map and probe flows.

Human-in-the-loop

You review actions; nothing irreversible runs blind.

Never touches production

Points at staging/QA; not a path to mutate production data.

Resilient

Recovers from modals, session timeouts and unexpected states.

Evidence, then verdict

The suite engine collects evidence; you judge pass/fail.

First-run doctor

Checks readiness across web/iOS/Android and prints exact fixes.

At scale

Built to do the work, not just assist.

One agent does what a manual QA pod does — across every platform, on every release. That's the lever: automate the repetitive test cycle and a leaner team ships more.

Parallel by default

N isolated sessions × N flows in one call — one agent fans out across platforms at once, instead of one tester at a time.

Hundreds of cases from a prompt

Generate, run and report a whole suite without writing or maintaining test code.

Every PR, every release

Wire it into CI so the full regression pass runs itself — overnight, on demand, not on your team's calendar.

The whole device matrix

iOS sim + physical iPhone + Android + web in one run — the coverage you'd otherwise staff a pod for.

By industry

Where watchr earns its place.

Local-first, multi-platform QA maps directly onto regulated, high-stakes teams — and the manual headcount they'd otherwise need.

Financial services & fintech

Customer financial data never leaves your machines. Drive account opening, login, 2FA, payments, transfers and statements on real iOS, Android and web — then mock declined cards, insufficient funds, rate-limit and outage responses to prove the unhappy paths without touching production. Every run leaves a timestamped audit trail of screenshots, network logs and a pass/fail report.

At scaleRuns the full cross-platform regression pass on every build — the manual sweep that used to tie up a QA pod for days — in parallel, with the evidence auditors ask for already attached.

iGaming & gambling

Verify exactly what the regulator checks: KYC and age-gates, geo/IP restrictions, responsible-gambling banners, deposit and loss limits, cool-off and self-exclusion — across iOS, Android and web. Seed accounts and mock provider responses so you can test limit breaches and exclusion states deterministically. Player data stays in your boundary.

At scaleRe-certifies every regulated flow on each build across all three platforms — work that grows headcount linearly with manual testers, done by one agent at once.

Healthcare & digital health

PHI stays on-device — nothing uploads to a cloud. Test patient onboarding, appointment booking, secure messaging, prescriptions and clinician dashboards on real devices, and run built-in accessibility audits to meet the access standards healthcare apps are held to. Mock EHR/FHIR endpoints to test error and consent states safely.

At scaleThe accessibility and journey sweeps a specialist bills weeks for, run automatically on every screen, every release.

Government & public sector

Accessibility isn't optional — the a11y audit catches missing labels, small touch targets, low contrast and keyboard traps on every screen, with evidence for your conformance report. Self-hosted and local by default; add security and SEO scans across the whole service. Test forms, eligibility flows and document uploads end-to-end.

At scaleThe full audit cycle a dedicated compliance team runs by hand, run continuously across the service for a fraction of the effort.

E-commerce & retail

Catch a broken checkout before your customers do — search, product page, cart, promo codes, guest and account checkout, and payment across iOS, Android and web. Visual regression guards product and cart pages against silent breakage, performance audits protect conversion, and route mocking simulates out-of-stock, price changes and payment failures.

At scalePeak-season QA coverage without hiring a peak-season QA team — full regression on every deploy, all platforms, in parallel.

Mobile-first startups & SaaS

Your agent ships features faster than QA can keep up. watchr gives that agent hands — generate suites from a sentence, run them across simulators, devices and browsers, and get video plus reports back — right inside Claude Code, Cursor or any MCP client. No test code to write or maintain.

At scaleYour existing engineers get a QA team's output — letting you defer (or skip) dedicated QA hires as you scale.