diff --git a/.skillsrc b/.skillsrc
index 3e16873..4e26c3d 100644
--- a/.skillsrc
+++ b/.skillsrc
@@ -4,6 +4,7 @@
 droid-control/skills/agent-browser
 droid-control/skills/capture
 droid-control/skills/compose
+droid-control/skills/desktop-control
 droid-control/skills/droid-cli
 droid-control/skills/droid-control
 droid-control/skills/pty-capture
diff --git a/plugins/droid-control/.factory-plugin/plugin.json b/plugins/droid-control/.factory-plugin/plugin.json
index 16e1250..8f6c249 100644
--- a/plugins/droid-control/.factory-plugin/plugin.json
+++ b/plugins/droid-control/.factory-plugin/plugin.json
@@ -1,5 +1,5 @@
 {
   "name": "droid-control",
-  "description": "Terminal and browser automation for testing, demos, QA, and computer-use tasks",
+  "description": "Terminal, browser, and native desktop automation for testing, demos, QA, and computer-use tasks",
   "version": "1.0.0"
 }
diff --git a/plugins/droid-control/ARCHITECTURE.md b/plugins/droid-control/ARCHITECTURE.md
index ba9409d..6dc160b 100644
--- a/plugins/droid-control/ARCHITECTURE.md
+++ b/plugins/droid-control/ARCHITECTURE.md
@@ -36,7 +36,7 @@ This is the first guardrail against agent drift. The droid does not start with "
 
 | Route | Question | Examples |
 |---|---|---|
-| **Target** | What are we driving? | Droid CLI, other terminal TUI, web/Electron app, raw PTY bytes |
+| **Target** | What are we driving? | Droid CLI, other terminal TUI, web/Electron app, native desktop app, raw PTY bytes |
 | **Stage** | What does the workflow need? | capture, compose, verify |
 | **Artifact** | Does compose need polish tools? | showcase presets, effects, keystroke overlays |
 
@@ -48,7 +48,7 @@ Each atom skill is a self-contained surface the droid reads at a specific point
 
 | Atom type | Skills | Responsibility |
 |---|---|---|
-| Driver atoms | `tuistory`, `true-input`, `agent-browser` | How to drive a class of environment. |
+| Driver atoms | `tuistory`, `true-input`, `agent-browser`, `desktop-control` | How to drive a class of environment. |
 | Target atoms | `droid-cli`, `pty-capture` | Target-specific shortcuts, launch rules, and byte-capture patterns. |
 | Stage atoms | `capture`, `compose`, `verify` | Lifecycle phases with explicit inputs and outputs. |
 | Polish atom | `showcase` | Visual presets and cinematic layer guidance. |
@@ -119,7 +119,7 @@ Terminal workflows use `bin/tctl` as the only launch/control boundary. It hides
 
 `tctl` also enforces Droid CLI launch invariants. `droid-dev` sessions must provide `--repo-root`, which lets `tctl` set `DROID_DEV_REPO_ROOT` and record provenance for the captured branch and commit.
 
-Browser and Electron workflows intentionally do **not** go through `tctl`; they use `agent-browser`, whose persistent Playwright-backed daemon is the right control boundary for DOM snapshots, screenshots, and CDP-connected apps.
+Browser/Electron and native-desktop workflows intentionally do **not** go through `tctl`. They have their own control boundaries: `agent-browser`'s persistent Playwright daemon for DOM snapshots, screenshots, and CDP-connected apps; `cua-driver`'s daemon for accessibility trees and per-`(pid, window_id)` element caches on desktop GUIs.
 
 ## Video composition
 
@@ -161,6 +161,9 @@ skills/true-input/platforms/macos.md
 skills/pty-capture/platforms/linux.md
 skills/pty-capture/platforms/windows.md
 skills/pty-capture/platforms/macos.md
+skills/desktop-control/platforms/linux.md
+skills/desktop-control/platforms/windows.md
+skills/desktop-control/platforms/macos.md
 ```
 
 A Linux droid reads Linux Wayland instructions. A Windows VM byte-capture task reads Windows KVM instructions. The system does not rely on the droid to skim irrelevant sections correctly.
diff --git a/plugins/droid-control/README.md b/plugins/droid-control/README.md
index c33cbb9..a2d97fa 100644
--- a/plugins/droid-control/README.md
+++ b/plugins/droid-control/README.md
@@ -89,6 +89,7 @@ The `render-showcase.sh` helper owns the full pipeline: `.cast` conversion via `
 | true-input | Windows (KVM) | `libvirt`, `qemu`, KVM VM with SSH |
 | true-input | macOS (QEMU) | `qemu`, `socat`, macOS VM with SSH |
 | agent-browser | All | `agent-browser` |
+| desktop-control | All | `cua-driver` |
 | compose | All | `ffmpeg`, `ffprobe`, `agg` |
 | showcase | All | Node.js (>= 18), Chrome/Chromium |
 
@@ -98,7 +99,8 @@ pip install asciinema                                 # terminal recording
 cargo install --git https://github.com/asciinema/agg  # .cast -> .gif converter
 sudo apt-get install -y ffmpeg                        # video processing
 agent-browser install                                 # browser automation (downloads Chromium)
+curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh | bash  # native desktop GUI automation
 cd plugins/droid-control/remotion && npm install      # Remotion video rendering
 ```
 
-Only install what you need for your use case. Terminal demos need tuistory, asciinema, agg, and ffmpeg. Web/Electron automation just needs agent-browser.
+Only install what you need for your use case. Terminal demos need tuistory, asciinema, agg, and ffmpeg. Web/Electron automation just needs agent-browser. Native desktop GUI automation just needs cua-driver.
diff --git a/plugins/droid-control/skills/capture/SKILL.md b/plugins/droid-control/skills/capture/SKILL.md
index 037e099..fd1e523 100644
--- a/plugins/droid-control/skills/capture/SKILL.md
+++ b/plugins/droid-control/skills/capture/SKILL.md
@@ -8,7 +8,7 @@ user-invocable: false
 
 The orchestrator routed you here. This atom owns the full recording lifecycle: launch a target, execute an interaction script, collect raw outputs.
 
-You should already have a **driver atom** loaded (tuistory, true-input, or agent-browser) and optionally a **target atom** (droid-cli). This atom layers the recording discipline on top.
+You should already have a **driver atom** loaded (tuistory, true-input, agent-browser, or desktop-control) and optionally a **target atom** (droid-cli). This atom layers the recording discipline on top.
 
 ## Inputs
 
@@ -28,7 +28,7 @@ Before recording anything:
 - Terminal size is consistent across all sessions (`--cols 120 --rows 36`)
 - **Browser viewport size matches the composition layout** (see "Browser viewport sizing" below) — mismatched aspects letterbox in the final video
 - Branch/worktree paths and env vars are correct
-- Recording format matches the driver: `.cast` for tuistory, `.mp4` for true-input, screenshots for agent-browser
+- Recording format matches the driver: `.cast` for tuistory, `.mp4` for true-input, screenshots for agent-browser, window PNGs / `recording.mp4` for desktop-control
 - If comparing branches, both sessions use identical terminal / viewport dimensions and launch parameters
 - For `droid-dev` captures, `--repo-root` is **mandatory** — `tctl` will refuse to launch without it
 - **Color env vars are set** (see below)
@@ -137,6 +137,7 @@ Before handing off, confirm every expected output file exists and is non-empty:
 | Visual rendering | Screenshots: `$TCTL -s <name> screenshot -o /tmp/proof-N.png` |
 | Keyboard encoding | PTY bytes: `${DROID_PLUGIN_ROOT}/scripts/capture-terminal-bytes.py --backend <terminal> --combo <keys>` |
 | Web/Electron | Screenshots: `agent-browser screenshot --annotate /tmp/proof-N.png` |
+| Native desktop GUI | Window screenshots + AX trees: `cua-driver get_window_state '{...}' --screenshot-out-file ${RUN_DIR}/proof-N.png`; video via `cua-driver recording start/stop` |
 | Before/after | Run the same sequence on both branches at the same capture points |
 
 ## Outputs
@@ -148,7 +149,7 @@ Hand these to the **compose** stage:
 - clips: [/tmp/before.cast, /tmp/after.cast]
 - screenshots: [/tmp/proof-1.png, /tmp/proof-2.png]
 - keys: /tmp/keys.tsv (if keystroke logging was requested)
-- driver: tuistory | true-input | agent-browser
+- driver: tuistory | true-input | agent-browser | desktop-control
 - terminal_size: 120x36          # for tuistory / true-input
 - viewport: 960x1000             # for agent-browser; report so compose knows the clip aspect
 ```
diff --git a/plugins/droid-control/skills/desktop-control/SKILL.md b/plugins/droid-control/skills/desktop-control/SKILL.md
new file mode 100644
index 0000000..ce971df
--- /dev/null
+++ b/plugins/droid-control/skills/desktop-control/SKILL.md
@@ -0,0 +1,127 @@
+---
+name: desktop-control
+description: Background knowledge for droid-control workflows -- not invoked directly. Desktop-control driver mechanics for native GUI app automation via trycua cua-driver.
+user-invocable: false
+---
+
+# Desktop-Control Driver
+
+The orchestrator routed you here. Use these mechanics to execute your plan.
+
+Drive native desktop GUI apps through upstream [trycua/cua](https://github.com/trycua/cua) `cua-driver`: enumerate apps and windows, snapshot accessibility trees, click/type/scroll by `element_index` or pixel coordinates, and verify by re-snapshot -- all without bringing the target to the foreground.
+
+## When to use
+
+- Automating a native desktop app (Finder, Notepad, System Settings, native editors)
+- Driving native dialogs and security/permission sheets that no DOM or PTY can reach
+- Visual QA of native UI: per-window screenshots, accessibility-tree assertions
+
+If the target is a terminal TUI, use **tuistory** or **true-input**. If it is a web page or an Electron app, use **agent-browser** -- CDP beats accessibility trees for anything Chromium-based.
+
+## Platform support
+
+| Platform | Upstream tier | Read |
+|---|---|---|
+| macOS | Production | [platforms/macos.md](platforms/macos.md) |
+| Windows | Production | [platforms/windows.md](platforms/windows.md) |
+| Linux | Pre-release (real caveats) | [platforms/linux.md](platforms/linux.md) |
+
+**Read the platform file for your target OS.** Each contains permissions, daemon launch, and platform-specific patterns and failure modes.
+
+## Prerequisites
+
+```bash
+# one-time install: per-user, no sudo/admin
+curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh | bash
+# Windows (PowerShell):
+#   irm https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.ps1 | iex
+
+cua-driver doctor           # platform probes: permissions, daemon, accessibility plumbing
+cua-driver skills install   # fetch the upstream skill pack to ~/.cua-driver/skills/cua-driver
+```
+
+The upstream pack (`~/.cua-driver/skills/cua-driver/SKILL.md` + your platform's doc) is the deep reference -- full tool surface, window-state behavior matrix, forbidden-command lists -- and it updates with the binary. **Read it before any nontrivial workflow.** This atom owns the droid-control integration: routing, run isolation, delegation, evidence handoff.
+
+## Daemon lifecycle
+
+`element_index` workflows **require the daemon**. Without it each CLI invocation is a fresh process and the per-`(pid, window_id)` element cache dies between calls.
+
+```bash
+cua-driver serve            # start the daemon (macOS needs the LaunchServices form -- see platforms/macos.md)
+cua-driver status           # daemon + socket health
+cua-driver stop
+```
+
+Permissions are checked and granted through the driver, not by hand-editing system settings (macOS-only gate; a no-op surface on Windows/Linux):
+
+```bash
+cua-driver permissions status   # read-only; answers via the running daemon
+cua-driver permissions grant    # attributed prompt flow -- the correct way to grant
+```
+
+## Core loop
+
+Tool names are `snake_case` and invoked directly: `cua-driver <tool> '<json>'`. (`cua-driver call <tool>` is legacy; do not use it.) `cua-driver list-tools` for the inventory, `cua-driver describe <tool>` for any schema.
+
+Every workflow is Discover -> Observe -> Act -> Verify against an explicit `(pid, window_id)`:
+
+```bash
+cua-driver launch_app '{"name":"TextEdit"}'
+#  -> {pid: 844, windows: [{window_id: 10725, ...}]}   # list_windows only needed for long-lived pids
+cua-driver get_window_state '{"pid":844,"window_id":10725}' --screenshot-out-file "${RUN_DIR}/before.png"
+cua-driver click '{"pid":844,"window_id":10725,"element_index":14,"session":"'"${RUN_ID}"'-desktop"}'
+cua-driver get_window_state '{"pid":844,"window_id":10725}' --screenshot-out-file "${RUN_DIR}/after.png"
+```
+
+**Snapshot before AND after every action.** The pre-action `get_window_state` resolves the `element_index` you are about to use -- indices are per-snapshot, per `(pid, window_id)`, and stale ones fail with `No cached AX state`. The post-action snapshot is the evidence the action landed; without it a silent no-op looks like success.
+
+Addressing-mode preference:
+
+1. **`element_index`** (default) -- semantic, works on hidden and backgrounded windows, no foreground change.
+2. **Pixel** `click '{"pid":N,"window_id":W,"x":X,"y":Y}'` -- for surfaces the tree does not reach (canvases, custom-drawn controls). Coordinates are window-local screenshot pixels, top-left origin.
+3. **Keyboard** (`press_key`, `hotkey`) and platform fallbacks -- last resort; see the platform files.
+
+## Run isolation (ground rule 5 -> cua sessions)
+
+cua sessions are the desktop equivalent of `tctl` session prefixes: a session owns its agent cursor, config overrides, and recording scope. Declare one per run, derived from the workflow's `RUN_ID`, and pass it on every action:
+
+```bash
+cua-driver start_session '{"session":"'"${RUN_ID}"'-desktop"}'
+# ... every action carries "session":"${RUN_ID}-desktop" ...
+cua-driver end_session '{"session":"'"${RUN_ID}"'-desktop"}'
+```
+
+Parallel workers each declare their **own session** and pass `creates_new_application_instance: true` to `launch_app` so each gets its own window. The element cache is keyed on `(pid, window_id)` and the cursor on `session`, so isolated workers cannot collide.
+
+## Delegation
+
+`cua-driver` is on PATH -- workers need no `${DROID_PLUGIN_ROOT}` resolution. As with the other drivers, give capture workers **exact commands** with the parent's run scope baked in:
+
+```
+Task prompt for a desktop capture worker:
+  "Run these commands in order. Report screenshot paths and any errors.
+   1. cua-driver start_session '{"session":"1712345678-42-notepad"}'
+   2. cua-driver launch_app '{"name":"Notepad","creates_new_application_instance":true}'
+      -> note the returned pid and window_id
+   3. cua-driver get_window_state '{"pid":<pid>,"window_id":<wid>}' --screenshot-out-file /tmp/droid-run-1712345678-42-xxxx/before.png
+   4. cua-driver type_text '{"pid":<pid>,"window_id":<wid>,"element_index":<text-area>,"text":"hello","session":"1712345678-42-notepad"}'
+   5. cua-driver get_window_state '{"pid":<pid>,"window_id":<wid>}' --screenshot-out-file /tmp/droid-run-1712345678-42-xxxx/after.png
+   6. cua-driver end_session '{"session":"1712345678-42-notepad"}'"
+```
+
+## Evidence handoff
+
+| Proof type | How to capture |
+|---|---|
+| Window state | `get_window_state ... --screenshot-out-file ${RUN_DIR}/proof-N.png` (also keeps the PNG out of the tool response) |
+| Full display | `cua-driver screenshot '{"out_file":"'"${RUN_DIR}"'/screen.png"}'` |
+| Semantic assertions | `tree_markdown` from `get_window_state` (filter with `"query":"..."`) |
+| Video | `cua-driver recording start` / `recording stop` -> session-scoped `recording.mp4` |
+
+Hand PNG/mp4 paths to **compose** / **verify** like any other driver output. Keep raw tool output alongside screenshots whenever GUI behavior is the thing under test.
+
+## Critical rules
+
+1. **Never change the user's frontmost app.** If a command says activate, foreground, raise, or make key -- stop; the per-pid event paths exist precisely so you do not need it. Platform forbidden-lists live in the upstream pack.
+2. **Re-snapshot after every action and report what you observed**, not what you intended. An unchanged tree after an action is a finding, not a formality.
+3. **Destructive actions need explicit user intent.** Do not delete files, send messages, or submit forms unless the workflow asked for exactly that.
diff --git a/plugins/droid-control/skills/desktop-control/platforms/linux.md b/plugins/droid-control/skills/desktop-control/platforms/linux.md
new file mode 100644
index 0000000..13ac37d
--- /dev/null
+++ b/plugins/droid-control/skills/desktop-control/platforms/linux.md
@@ -0,0 +1,57 @@
+# Desktop-Control: Linux
+
+cua-driver on Linux enumerates windows via **X11**, walks semantic trees via **AT-SPI**, and injects input via **XSendEvent** (synthetic events targeted at a window XID -- no focus change, nothing leaks to the user's focused app). Upstream calls this tier pre-release, and it shows: the lifecycle (install, daemon, doctor, sessions, one-shot CLI), window discovery, and per-window screenshots are solid; Wayland-native enumeration, AT-SPI tree quality, and input delivery are not. Plan workflows around the reliable half.
+
+## Install and daemon
+
+Same installer and lifecycle as everywhere else (no sudo, `~/.cua-driver`):
+
+```bash
+cua-driver doctor    # trustworthy probes: catches missing DISPLAY, verifies X11 + AT-SPI before you waste a run
+cua-driver serve     # required for element_index workflows
+cua-driver status
+```
+
+`cua-driver permissions` is a no-op surface on Linux.
+
+## The Wayland boundary
+
+Window enumeration is **X11-only**. On a modern Plasma/GNOME Wayland desktop, native-Wayland windows are invisible to `list_windows` -- which is most windows.
+
+- Targets running under **Xwayland** (or a plain X11 session) enumerate and screenshot fine.
+- To drive an app that defaults to native Wayland, force its X11 backend at launch where the toolkit allows it: `QT_QPA_PLATFORM=xcb` (Qt), `GDK_BACKEND=x11` (GTK), `--ozone-platform=x11` (Chromium/Electron).
+- If the target cannot be put on X11, desktop-control cannot see it -- fall back to **agent-browser** (web/Electron) or **true-input** (terminal emulators).
+
+## Semantic layer (AT-SPI) reliability
+
+AT-SPI trees can collapse: the registry's `GetChildren` may time out, and Qt apps can render as a single root node even with `QT_LINUX_ACCESSIBILITY_ALWAYS_ON=1`. When `get_window_state` returns a near-empty tree:
+
+```bash
+cua-driver config set capture_mode vision   # screenshot-only snapshots
+```
+
+and work the pixel path (`click '{"pid":N,"window_id":W,"x":X,"y":Y}'`) against the returned PNG. Don't burn turns re-snapshotting hoping the tree fills in -- on this tier, pixel-first is a legitimate default.
+
+## The toolkit boundary: synthetic input is silently dropped by Qt and GTK4
+
+XSendEvent marks events with the `send_event` flag, and major toolkits **ignore flagged input entirely**. Verified on v0.5.1: Qt apps (kcalc) and GTK4 apps (zenity) no-op on *every* action -- pixel clicks, `press_key`, `type_text` -- while the driver reports success. There is no error to catch; only the post-action snapshot reveals it.
+
+Practical consequence: the Act stage only works against apps that honor synthetic events (verified: winit-based apps like alacritty; generally simpler/older X11 toolkits). **Probe before committing to a workflow**: send one cheap keystroke, re-snapshot, and check it rendered. If the target ignores synthetic input, desktop-control cannot act on it on this tier -- Observe (screenshots, window enumeration) still works, but route the interaction through **agent-browser** (web/Electron) or **true-input** (terminal) instead.
+
+## Text input is lossy even where it lands
+
+In apps that do accept synthetic input, typing drops and mangles characters: shifted symbols can inject as their unshifted key (`*` arriving as `8`), trailing characters get dropped (verified: `type_text "echo ok42"` rendered `echo ok4`), and `type_text_chars` with generous per-char delays still loses keystrokes. `hotkey` chords (including paste shortcuts) and middle-click paste do **not** land reliably, so the clipboard is not a workaround here.
+
+What works: short bursts plus verification. After every `type_text`, re-snapshot, compare the rendered text against what you sent, and repair the diff (`press_key` backspace, retype the missing tail). On Linux the post-action screenshot is not a formality -- it is the only way to know what actually arrived.
+
+## Failure modes
+
+| Symptom | Fix |
+|---|---|
+| Expected window missing from `list_windows` | Native-Wayland target -- relaunch it on the X11 backend (`QT_QPA_PLATFORM=xcb` / `GDK_BACKEND=x11` / `--ozone-platform=x11`) |
+| Tree is a single root node / AT-SPI timeouts | `capture_mode vision` + pixel actions |
+| Every action "succeeds" but nothing changes | Toolkit drops `send_event` input (Qt, GTK4) -- target is unreachable on this tier; use agent-browser or true-input for the interaction |
+| Typed text arrives mangled or truncated | Verify-and-repair loop: re-snapshot, diff rendered text, backspace + retype the tail |
+| `doctor` reports no DISPLAY | Run from the graphical session (or export the session's `DISPLAY`/`XAUTHORITY`), not a bare TTY/SSH context |
+
+Deep mechanics live in the upstream pack: `~/.cua-driver/skills/cua-driver/LINUX.md`.
diff --git a/plugins/droid-control/skills/desktop-control/platforms/macos.md b/plugins/droid-control/skills/desktop-control/platforms/macos.md
new file mode 100644
index 0000000..e7c1645
--- /dev/null
+++ b/plugins/droid-control/skills/desktop-control/platforms/macos.md
@@ -0,0 +1,64 @@
+# Desktop-Control: macOS
+
+cua-driver on macOS posts events per-pid through Accessibility (AX) and captures via ScreenCaptureKit. Both are gated by TCC, and TCC attributes grants to the **app bundle that asks** -- which is why every flow below routes through `CuaDriver.app` instead of your terminal.
+
+## Permissions (TCC)
+
+```bash
+cua-driver permissions grant    # LaunchServices-routed: the Accessibility + Screen Recording dialogs
+                                # attribute to com.trycua.driver, then it confirms the driver's own status
+cua-driver permissions status   # read-only via the daemon; reports `unknown` when no daemon is up
+```
+
+Do not grant by clicking through System Settings for your terminal app -- the daemon runs under the bundle identity, and terminal-attributed grants do nothing for it. The first real screen capture may trigger one extra consent sheet; accept it.
+
+## Daemon launch
+
+Launch from the logged-in GUI session so the daemon attaches to it with the bundle's TCC identity:
+
+```bash
+open -n -g -a CuaDriver --args serve
+cua-driver status
+cua-driver stop
+```
+
+SSH-launched bare binaries often miss the GUI session and their AX/capture probes hang. (`cua-driver mcp` and CLI tool calls auto-proxy to a properly attributed daemon when one is reachable.)
+
+## Patterns
+
+**Reliable terminal command entry** -- when `type_text` or raw key posting drops characters in Terminal-class apps, route through the pasteboard:
+
+```bash
+printf '%s' 'your command' | pbcopy
+cua-driver hotkey    '{"pid":<term-pid>,"window_id":<wid>,"keys":["cmd","v"]}'
+cua-driver press_key '{"pid":<term-pid>,"window_id":<wid>,"key":"return"}'
+```
+
+**Native security / modal sheets** (SecurityAgent, Keychain prompts, auth dialogs) -- these often report `is_on_screen: false` even while visible. Locate by process, then enumerate everything:
+
+```bash
+pgrep -fl SecurityAgent
+cua-driver list_windows '{"pid":<sa-pid>,"on_screen_only":false}'
+cua-driver get_window_state '{"pid":<sa-pid>,"window_id":<wid>}'
+```
+
+Only enter credentials in environments you own and were explicitly authorized to drive.
+
+**Menu commands / app shortcuts** -- pass `window_id` so AppKit routes the key equivalent to the target app instead of the frontmost one:
+
+```bash
+cua-driver hotkey '{"pid":835,"window_id":79,"keys":["cmd","q"]}'
+```
+
+**Backgrounded / off-space windows** -- the driver acts on `(pid, window_id)` without raising. Enumerate with `on_screen_only: false` and target directly.
+
+## Failure modes
+
+| Symptom | Fix |
+|---|---|
+| AX write fails (`AXPress` returns `-25204`) on a system sheet | Fall back to `press_key` / `hotkey` / pixel `click` |
+| ScreenCaptureKit error (e.g. SCK `-3801`) in `som`/`vision` capture | `cua-driver config set capture_mode ax` (tree-only, skips Screen Recording), or retry |
+| Known dialog missing from `list_windows` results | Re-query with `"on_screen_only": false` |
+| Probes hang / permissions report `unknown` | Daemon was launched without GUI attribution -- `cua-driver stop`, relaunch via `open -n -g -a CuaDriver --args serve` |
+
+Deep mechanics (no-foreground forbidden-list, AXMenuBar navigation, SkyLight click dispatch, Apple-Events browser bridge) live in the upstream pack: `~/.cua-driver/skills/cua-driver/MACOS.md`.
diff --git a/plugins/droid-control/skills/desktop-control/platforms/windows.md b/plugins/droid-control/skills/desktop-control/platforms/windows.md
new file mode 100644
index 0000000..b55ae11
--- /dev/null
+++ b/plugins/droid-control/skills/desktop-control/platforms/windows.md
@@ -0,0 +1,45 @@
+# Desktop-Control: Windows
+
+cua-driver on Windows walks UI Automation (UIA) trees and dispatches actions through a layered UIA + `PostMessage` chain -- per-window message posting, not HID synthesis, so the user's foreground app is untouched.
+
+## Install and daemon
+
+The upstream installer is per-user (no admin elevation): binary under `%LOCALAPPDATA%\Programs\Cua\cua-driver\bin`, data and skill pack under `%USERPROFILE%\.cua-driver`, and an autostart task (`cua-driver autostart status|kick|disable`) registered for the daemon.
+
+```powershell
+cua-driver doctor
+cua-driver serve     # required for element_index workflows
+cua-driver status
+cua-driver stop
+```
+
+`cua-driver permissions` is a no-op surface on Windows (TCC is a macOS concept) -- there is no grant dance. The real constraint is **Session 0 isolation**: anything launched by a service (including some SSH daemons) lives in a session with no interactive desktop, where window enumeration returns nothing. Tool calls auto-proxy to an interactive-session daemon when one is reachable; if results come back empty, confirm the daemon was started from the logged-in interactive session, not a service context.
+
+## JSON quoting (the PowerShell 5.1 footgun)
+
+Windows PowerShell 5.1 strips quotes around JSON field names in multi-field arguments, so positional JSON fails to parse. Pipe via stdin, or use PowerShell 7+ (`pwsh`):
+
+```powershell
+'{"pid":1234,"window_id":5678}' | cua-driver get_window_state
+```
+
+From `cmd.exe`, escape inner quotes instead: `cua-driver get_window_state "{\"pid\":1234,\"window_id\":5678}"`.
+
+## Patterns
+
+**UWP / packaged apps** -- Store apps (Calculator, Settings) are hosted by `ApplicationFrameHost.exe`, so the visible window's pid is the host's, not the app process's. If `list_windows` against the app's own pid comes up empty, enumerate `ApplicationFrameHost.exe`'s windows and match by title. Classic Win32 apps (Notepad, Explorer) own their windows directly.
+
+**Minimized windows** -- `get_window_state` and element-index actions work in place, but `press_key` commits silently no-op (no message pump focus). Use `set_value` or element-index-click the commit-equivalent button instead.
+
+**Browsers / Electron** -- prefer **agent-browser**. If you must stay in desktop-control, launch the browser with `--remote-debugging-port=<port>` and export `CUA_DRIVER_CDP_PORT=<port>` so `execute_javascript` / `query_dom` can attach; UIA covers `get_text` either way.
+
+## Failure modes
+
+| Symptom | Fix |
+|---|---|
+| `UIA invoke failed` on an element | Try `click` with an explicit `action` (`show_menu`, `confirm`, ...) or fall through to a pixel click on the element's center |
+| Empty window lists, blank screenshots | Session 0 daemon -- restart `cua-driver serve` from the interactive desktop session |
+| Positional JSON "did not parse" errors | PowerShell 5.1 quote-stripping -- pipe JSON via stdin or use `pwsh` |
+| Target window not under the app's pid | UWP hosting -- enumerate `ApplicationFrameHost.exe` windows |
+
+Deep mechanics (UIA tree semantics, click-dispatch layering, focus-steal vectors, UAC boundaries) live in the upstream pack: `~/.cua-driver/skills/cua-driver/WINDOWS.md`.
diff --git a/plugins/droid-control/skills/droid-control/SKILL.md b/plugins/droid-control/skills/droid-control/SKILL.md
index dabc9e1..0c9b006 100644
--- a/plugins/droid-control/skills/droid-control/SKILL.md
+++ b/plugins/droid-control/skills/droid-control/SKILL.md
@@ -35,9 +35,10 @@ Three independent lookups. Do all three, then load the union of skills they prod
 | Other terminal TUI | tuistory backend via `${DROID_PLUGIN_ROOT}/bin/tctl` |
 | Other terminal TUI (real terminal proof) | **true-input** |
 | Web page or Electron app | **agent-browser** |
+| Native desktop GUI app | **desktop-control** |
 | Raw terminal byte sequences | **true-input** + **pty-capture** |
 
-**tuistory** is the default for terminal work. Use **true-input** only when you need real terminal rendering evidence.
+**tuistory** is the default for terminal work. Use **true-input** only when you need real terminal rendering evidence. On Linux, desktop-control rides upstream's pre-release tier -- its platform file documents the Wayland/AT-SPI/input caveats and when to fall back to **agent-browser** or **true-input**.
 
 ### 2. Stage route — what does the workflow need?
 
@@ -157,7 +158,7 @@ For before/after comparison demos, launch both capture workers simultaneously:
 
 ## Shared tooling
 
-Terminal drivers use the unified `tctl` wrapper. agent-browser has its own CLI and does not use `tctl`.
+Terminal drivers use the unified `tctl` wrapper. agent-browser and desktop-control have their own CLIs (`agent-browser`, `cua-driver`) and do not use `tctl`.
 
 Drivers can be combined in one workflow — e.g., `tctl` for a CLI and `agent-browser` for a web UI it interacts with.
 
@@ -170,6 +171,7 @@ Drivers can be combined in one workflow — e.g., `tctl` for a CLI and `agent-br
 | true-input | Windows (KVM) | `libvirt`, `qemu`, KVM VM with SPICE + SSH, `DROID_VM_*` env vars | `virt-manager` |
 | true-input | macOS (QEMU) | `qemu`, `socat`, macOS VM with SSH, `DROID_MAC_*` env vars | — |
 | agent-browser | All | `agent-browser` (+ `agent-browser install`) | — |
+| desktop-control | All | `cua-driver` (+ daemon via `cua-driver serve`; macOS also `cua-driver permissions grant`) | upstream skill pack (`cua-driver skills install`) |
 | compose | All | `ffmpeg`, `ffprobe`, `agg` | — |
 | showcase | All | Node.js (>= 18), Chrome/Chromium | — |
 
@@ -188,6 +190,10 @@ sudo apt-get install -y grim wf-recorder             # optional: screenshots + v
 # agent-browser driver
 agent-browser install                                # one-time: downloads bundled Chromium
 
+# desktop-control driver (Windows hosts: irm .../scripts/install.ps1 | iex)
+curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh | bash
+cua-driver skills install                            # upstream skill pack (deep tool reference)
+
 # compose + showcase (video rendering)
 sudo apt-get install -y ffmpeg                       # video processing (includes ffprobe)
 cd ${DROID_PLUGIN_ROOT}/remotion && npm install       # Remotion dependencies