diff --git a/.claude/skills/functional-test/SKILL.md b/.claude/skills/functional-test/SKILL.md
new file mode 100644
index 00000000..ca236008
--- /dev/null
+++ b/.claude/skills/functional-test/SKILL.md
@@ -0,0 +1,148 @@
+---
+name: functional-test
+description: >
+  Use this skill when running functional tests to validate PerfSpect code changes,
+  when the user says "run functional tests", "test my changes", "check for regressions",
+  or when verifying a code change did not break existing functionality.
+---
+
+> **Skill Loaded:** "Using functional-test skill."
+
+# Functional Test Runner
+
+Run targeted PerfSpect functional tests on a remote target to validate code changes. Identify the specific tests affected by a change, run them, and verify output aligns with the change.
+
+## Test script
+
+`../tools/perfspect/functional_test.sh` (relative to the perfspect repo root). Verify the file exists before proceeding.
+
+## Prerequisites
+
+1. **Built binary.** Run `make` (x86_64) or `make perfspect-aarch64` (ARM64). Binary must be at `./perfspect` (or set `PERFSPECT_DIR`).
+2. **Remote target.** User must provide: hostname/IP (`TARGET`), SSH user (`USER_NAME`), private key path (`PRIVATE_KEY_PATH`). Password-less sudo must be configured on the target.
+3. **Target dependencies.** `stress-ng` on the target. For flame tests: `java` and `/tmp/primes.java` (copy from `../tools/perfspect/primes.java`).
+
+## Workflow
+
+### Step 1 — Analyze the code change
+
+Run `git diff main...HEAD` (or the appropriate base). Read the diff. Identify:
+
+- **What changed**: flag names, validation logic, error messages, output formats, collection behavior, report generation, table definitions, script content.
+- **Behavioral impact**: Does the change alter a CLI flag? A validation rule? An error message string? An output file format? A collection path? A report table?
+
+### Step 2 — Identify affected test categories
+
+Use the code-to-category mapping below to determine which `TEST_*` categories are affected.
+
+| Changed path | Categories |
+|---|---|
+| `cmd/config/` | `TEST_CONFIG` |
+| `cmd/flamegraph/` | `TEST_FLAME` |
+| `cmd/lock/` | `TEST_LOCK` |
+| `cmd/metrics/` | `TEST_METRICS` |
+| `cmd/report/` | `TEST_REPORT` |
+| `cmd/benchmark/` | `TEST_BENCHMARK` |
+| `cmd/telemetry/` | `TEST_TELEMETRY` |
+| `cmd/root.go` | All — trace the specific change to narrow |
+| `internal/app/` | All — trace the specific change to narrow |
+| `internal/workflow/` | All reporting commands — trace to narrow |
+| `internal/extract/` | `TEST_REPORT`, `TEST_TELEMETRY`, `TEST_METRICS` |
+| `internal/target/` | All — affects SSH/local execution |
+| `internal/script/` | All — affects script execution |
+| `internal/report/` | `TEST_REPORT`, `TEST_BENCHMARK`, `TEST_TELEMETRY`, `TEST_METRICS`, `TEST_FLAME` |
+| `internal/table/` | `TEST_REPORT`, `TEST_BENCHMARK`, `TEST_TELEMETRY` |
+| `internal/cpus/` | All — CPU detection used everywhere |
+| `internal/progress/` | All — progress UI used everywhere |
+| `internal/util/` | All — trace the specific change to narrow |
+| `main.go`, `go.mod`, `go.sum` | All |
+| `scripts/`, `tools/` | All — embedded resources |
+
+### Step 3 — Identify specific affected tests
+
+Read the test catalog for each affected category. Load **only** the doc files for affected categories:
+
+| Category | Test catalog |
+|---|---|
+| `TEST_CONFIG` | [docs/config-tests.md](docs/config-tests.md) |
+| `TEST_FLAME` | [docs/flame-tests.md](docs/flame-tests.md) |
+| `TEST_LOCK` | [docs/lock-tests.md](docs/lock-tests.md) |
+| `TEST_METRICS` | [docs/metrics-tests.md](docs/metrics-tests.md) |
+| `TEST_REPORT` | [docs/report-tests.md](docs/report-tests.md) |
+| `TEST_BENCHMARK` | [docs/benchmark-tests.md](docs/benchmark-tests.md) |
+| `TEST_TELEMETRY` | [docs/telemetry-tests.md](docs/telemetry-tests.md) |
+
+Within the loaded catalog, find every test whose behavior intersects with the change using these criteria:
+
+1. **Flag changes** — Tests that pass the changed flag in `t_args`.
+2. **Error message changes** — Tests whose `t_expect_stderr` matches the changed error string.
+3. **Output format changes** — Tests that exercise the changed format via `--format` in `t_args`.
+4. **Collection behavior changes** — Tests that exercise the changed collection path (scope, granularity, duration, live mode, workload-driven, etc.).
+5. **Shared infrastructure changes** — If the change is in shared code (`internal/target/`, `internal/script/`, `internal/workflow/`, `internal/app/`, `cmd/root.go`, `main.go`), trace the change to the specific behavior and find tests that trigger it across categories. Do not blindly run all tests.
+6. **stdout/stderr pattern changes** — Tests whose `t_expect_stdout` or `t_expect_stderr` contains text the change modifies.
+7. **Custom validation function changes** — Tests with `t_expect_func` that validate output artifacts affected by the change.
+
+Build a list of specific test names (`t_name` values) and their category.
+
+### Step 4 — Predict expected test outcomes
+
+For each identified test, determine whether the code change should:
+
+- **Not alter the test result** (regression check) — The test must still PASS with the same output patterns.
+- **Change the test's expected behavior** — The test's expectations (`t_expect_exit`, `t_expect_stdout`, `t_expect_stderr`, `t_expect_func`) no longer match the new code. Flag this to the user: the test script itself must be updated. Explain what the new expected values must be.
+- **Make a previously-skipped test runnable** — If the change adds support for something that was previously guarded.
+
+### Step 5 — Run the affected test categories
+
+Disable all categories except those containing affected tests:
+
+```bash
+TARGET=<host> USER_NAME=<user> PRIVATE_KEY_PATH=<key> \
+  PERFSPECT_DIR=. \
+  TEST_CONFIG=false TEST_FLAME=false TEST_LOCK=false TEST_METRICS=false \
+  TEST_REPORT=false TEST_BENCHMARK=false TEST_TELEMETRY=false \
+  <enable affected categories here>=true \
+  ../tools/perfspect/functional_test.sh -q -v
+```
+
+Add `NO_ROOT=true` if the remote user does not have password-less sudo.
+
+### Step 6 — Verify output aligns with the change
+
+Do not stop at PASS/FAIL. For each affected test:
+
+1. **Read the test output.** Examine `test/output/<N>-<test_name>/stdout.txt`, `stderr.txt`, and `perfspect.log`.
+2. **Verify the change is reflected.** Follow the output verification guidance in the category's doc file. Examples:
+   - Error message changed → confirm `stderr.txt` contains the new text.
+   - New output field added → confirm it appears in `stdout.txt` or generated report files.
+   - Chart/report generation changed → confirm output HTML/JSON/CSV contains expected new content.
+   - Bug fix that eliminated ERROR log entries → confirm `perfspect.log` no longer contains `level=ERROR` for the affected path.
+   - Collection behavior changed → confirm `stderr.txt` shows expected collection messages and `stdout.txt` shows expected output files.
+3. **Check for unintended side effects.** Scan output of non-target tests in the same category for unexpected ERRORs or changed output patterns.
+
+### Step 7 — Report to user
+
+Provide:
+- The list of tests identified as affected and why.
+- PASS/FAIL status of each.
+- For each affected test: what was verified in the output and whether the change is reflected correctly.
+- Any tests whose expectations must be updated in the test script (with the specific `t_expect_*` values that must change).
+- Any tests that passed but whose output reveals a concern.
+
+## Environment variable reference
+
+| Variable | Default | Purpose |
+|---|---|---|
+| `PERFSPECT_DIR` | `.` | Path to directory containing the `perfspect` binary |
+| `ROOT_OUTPUT_DIR` | `test/output` | Output directory for test artifacts |
+| `TARGET` | _(empty)_ | Remote target hostname/IP (empty = local) |
+| `USER_NAME` | _(empty)_ | SSH username for remote target |
+| `PRIVATE_KEY_PATH` | _(empty)_ | SSH private key path for remote target |
+| `NO_ROOT` | `false` | Set to `true` to run without root |
+| `TEST_CONFIG` | `true` | Run config tests |
+| `TEST_FLAME` | `true` | Run flame tests |
+| `TEST_LOCK` | `true` | Run lock tests |
+| `TEST_METRICS` | `true` | Run metrics tests |
+| `TEST_REPORT` | `true` | Run report tests |
+| `TEST_BENCHMARK` | `true` | Run benchmark tests |
+| `TEST_TELEMETRY` | `true` | Run telemetry tests |
diff --git a/.claude/skills/functional-test/docs/benchmark-tests.md b/.claude/skills/functional-test/docs/benchmark-tests.md
new file mode 100644
index 00000000..10ea97fe
--- /dev/null
+++ b/.claude/skills/functional-test/docs/benchmark-tests.md
@@ -0,0 +1,30 @@
+# Benchmark Tests (TEST_BENCHMARK)
+
+## Test catalog
+
+| Test name | Args exercised | Validates |
+|---|---|---|
+| `benchmark default` | `benchmark` | Default benchmark (all benchmarks, default format) |
+| `benchmark input` | `benchmark --input <prev>` | Reprocessing from `benchmark default` output |
+| `benchmark invalid benchmark` | `benchmark --foo` | Exit 1, unknown flag rejected by cobra |
+| `benchmark invalid format` | `benchmark --format invalid` | Exit 1 |
+
+## Flags exercised
+
+`--input`, `--format`, unknown flags (cobra validation)
+
+Note: The test script does not exercise individual benchmark selection flags (`--speed`, `--power`, `--temperature`, `--frequency`, `--memory`, `--cache`, `--storage`) or `--storage-dir`, `--no-summary`. Changes to these flags are covered only by `benchmark default` (which runs with `--all` implicitly).
+
+## Test dependencies
+
+- `benchmark input` depends on the output of `benchmark default` (uses its output directory as `--input`).
+
+## Output verification guidance
+
+- **`benchmark default`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains benchmark report files. If the change affects benchmark collection, summary table generation, or reference data comparisons, inspect the output report content.
+- **`benchmark input`**: Verify reprocessing produces output without re-running benchmarks.
+- **`benchmark invalid benchmark`**: Verifies cobra rejects unknown flags. This test is stable unless the flag name `--foo` is added as a real flag (unlikely).
+- **`benchmark invalid format`**: Verify exit code is 1.
+- **If benchmark selection flags change**: Only `benchmark default` (all benchmarks) is tested. Individual benchmark flags are not exercised. If a benchmark is added/removed/renamed, verify `benchmark default` still passes and its output reflects the change.
+- **If `--format` options change**: Same pattern as other commands — `benchmark invalid format` still passes, but `benchmark default` output should be checked for the new format.
+- **If `--storage-dir` validation changes**: No test exercises this flag directly. Manual verification required.
diff --git a/.claude/skills/functional-test/docs/config-tests.md b/.claude/skills/functional-test/docs/config-tests.md
new file mode 100644
index 00000000..305d6c0b
--- /dev/null
+++ b/.claude/skills/functional-test/docs/config-tests.md
@@ -0,0 +1,40 @@
+# Config Tests (TEST_CONFIG)
+
+All config tests require root (`t_requires_root=true`).
+
+## Test catalog
+
+| Test name | Args exercised | Validates |
+|---|---|---|
+| `config help` | `config --help` | Help text prints `Usage:` |
+| `config default` | `config` | No-op prints `No changes requested` and `Configuration` |
+| `config gov epb epp` | `config --gov performance --epb 0 --epp 0` | Applies governor/epb/epp, stderr confirms each setting |
+| `config disable l2hw prefetcher` | `config --pref-l2hw disable` | Prefetcher disable, stderr confirms |
+| `config enable l2hw prefetcher no-summary` | `config --pref-l2hw enable --no-summary` | Prefetcher enable with `--no-summary` suppresses stdout table |
+| `config invalid core count` | `config --cores 0` | Exit 1, stderr: `invalid flag value, --cores 0, valid values are` |
+| `config invalid llc size` | `config --llc 0` | Exit 1, stderr: `invalid flag value, --llc 0, valid values are` |
+| `config invalid core frequency` | `config --core-max .05` | Exit 1, stderr: `invalid flag value, --core-max 0.05, valid values are` |
+| `config invalid tdp` | `config --tdp 0` | Exit 1, stderr: `invalid flag value, --tdp 0, valid values are` |
+| `config invalid epb` | `config --epb 16` | Exit 1, stderr: `invalid flag value, --epb 16, valid values are` |
+| `config invalid epp` | `config --epp 256` | Exit 1, stderr: `invalid flag value, --epp 256, valid values are` |
+| `config invalid governor` | `config --gov invalid` | Exit 1, stderr: `invalid flag value, --gov invalid, valid values are` |
+| `config invalid elc` | `config --elc invalid` | Exit 1, stderr: `invalid flag value, --elc invalid, valid values are` |
+| `config invalid uncore max frequency` | `config --uncore-max .05` | Exit 1, stderr: `invalid flag value, --uncore-max 0.05, valid values are` |
+| `config invalid uncore min frequency` | `config --uncore-min .05` | Exit 1, stderr: `invalid flag value, --uncore-min 0.05, valid values are` |
+| `config invalid uncore max compute frequency` | `config --uncore-max-compute .05` | Exit 1, stderr: `invalid flag value, --uncore-max-compute 0.05, valid values are` |
+| `config invalid uncore min compute frequency` | `config --uncore-min-compute .05` | Exit 1, stderr: `invalid flag value, --uncore-min-compute 0.05, valid values are` |
+| `config invalid uncore max io frequency` | `config --uncore-max-io .05` | Exit 1, stderr: `invalid flag value, --uncore-max-io 0.05, valid values are` |
+| `config invalid uncore min io frequency` | `config --uncore-min-io .05` | Exit 1, stderr: `invalid flag value, --uncore-min-io 0.05, valid values are` |
+| `config invalid l2hw prefetcher` | `config --pref-l2hw invalid` | Exit 1, stderr: `invalid flag value, --pref-l2hw invalid, valid values are` |
+| `config invalid c6` | `config --c6 invalid` | Exit 1, stderr: `invalid flag value, --c6 invalid, valid values are` |
+| `config invalid c1 demotion` | `config --c1-demotion invalid` | Exit 1, stderr: `invalid flag value, --c1-demotion invalid, valid values are` |
+
+## Flags exercised
+
+`--gov`, `--epb`, `--epp`, `--pref-l2hw`, `--no-summary`, `--cores`, `--llc`, `--core-max`, `--tdp`, `--elc`, `--uncore-max`, `--uncore-min`, `--uncore-max-compute`, `--uncore-min-compute`, `--uncore-max-io`, `--uncore-min-io`, `--c6`, `--c1-demotion`, `--help`
+
+## Output verification guidance
+
+- **Positive tests** (`config gov epb epp`, `config disable l2hw prefetcher`, etc.): Verify `stderr.txt` contains the `set <flag> to <value>` confirmation messages. Verify `stdout.txt` contains the `Configuration` table when `--no-summary` is not set, and does not contain it when `--no-summary` is set.
+- **Negative tests** (all `config invalid *`): Verify `stderr.txt` contains the exact `Error: invalid flag value, --<flag> <value>, valid values are` message. Verify exit code is 1.
+- **If a validation range changes** (e.g., `--epb` now accepts 0-20 instead of 0-15): The `config invalid epb` test passes `--epb 16` and expects exit 1. If 16 is now valid, this test must be updated — flag to user with the new boundary value.
diff --git a/.claude/skills/functional-test/docs/flame-tests.md b/.claude/skills/functional-test/docs/flame-tests.md
new file mode 100644
index 00000000..6629361a
--- /dev/null
+++ b/.claude/skills/functional-test/docs/flame-tests.md
@@ -0,0 +1,43 @@
+# Flame Tests (TEST_FLAME)
+
+All flame tests require root (`t_requires_root=true`).
+
+## Test catalog
+
+| Test name | Runner | Args exercised | Validates |
+|---|---|---|---|
+| `flame duration java` | `run_test` | `flame --duration 10 --format all` + java workload | JSON output contains `primes.java` in `Flamegraph[0]["Java Stacks"]` |
+| `flame duration native` | `run_test` | `flame --duration 10 --format all` + stress-ng | JSON output contains `stress-ng` in `Flamegraph[0]["Native Stacks"]` |
+| `flame dual native stacks` | `run_test` | `flame --duration 10 --format all --dual-native-stacks` + stress-ng | Dual stack mode, JSON validates `stress-ng` in Native Stacks |
+| `flame all options` | `run_test` | `flame --duration 10 --frequency 10 --format html,json --no-summary --max-depth 20 --perf-event instructions` + java + `--pids` | All flags combined, JSON validates `primes.java` in Java Stacks |
+| `flame with input` | `run_test` | `flame --input <prev_output>` | Reprocessing from raw data produced by `flame all options` |
+| `flame invalid format` | `run_test` | `flame --format html,invalid` | Exit 1, stderr: `format options are: all, html, txt, json` |
+| `flame invalid duration` | `run_test` | `flame --duration -1` | Exit 1, stderr: `duration must be 0 or greater` |
+| `flame invalid frequency` | `run_test` | `flame --frequency 0` | Exit 1, stderr: `frequency must be 1 or greater` |
+| `flame sigint native` | `run_sigint_test` | `flame --format all --no-summary` + stress-ng, SIGINT after 15s | Graceful shutdown: log ends with `Shutting down`, `perf` and `processwatch` no longer running, JSON validates `stress-ng` |
+| `flame sigint java` | `run_sigint_test` | `flame --format all --no-summary` + java, SIGINT after 15s | Graceful shutdown: log ends with `Shutting down`, JSON validates `primes.java` |
+
+## Flags exercised
+
+`--duration`, `--format`, `--frequency`, `--no-summary`, `--max-depth`, `--perf-event`, `--dual-native-stacks`, `--pids`, `--input`
+
+## Custom validation functions
+
+Tests `flame duration java`, `flame all options`, `flame sigint java` use:
+```bash
+jq -r ".["Flamegraph"][0]["Java Stacks"]" "$1"/*_flame.json | grep -q "primes.java"
+```
+
+Tests `flame duration native`, `flame dual native stacks`, `flame sigint native` use:
+```bash
+jq -r ".["Flamegraph"][0]["Native Stacks"]" "$1"/*_flame.json | grep -q "stress-ng"
+```
+
+## Output verification guidance
+
+- **Collection tests** (`flame duration java`, `flame duration native`, `flame dual native stacks`, `flame all options`): Verify `*_flame.json` exists in the output directory. Parse it with `jq` to confirm the expected stack type contains the workload name.
+- **Input reprocessing** (`flame with input`): Verify it regenerates output from previously-collected raw data without re-collecting.
+- **Negative tests**: Verify `stderr.txt` contains the exact error message string. Verify exit code is 1.
+- **SIGINT tests**: Verify `perfspect.log` last line contains `Shutting down`. Verify no `perf` or `processwatch` processes remain on target. Verify the `t_expect_func` JSON validation still passes (data was collected before shutdown).
+- **If `--format` options change**: The `flame invalid format` test expects the error `format options are: all, html, txt, json`. Update the expected string if format options are added or removed.
+- **If JSON output structure changes**: The custom validation functions parse `*_flame.json` with specific jq paths. If the JSON schema changes, these tests will fail — flag to user that both code and test `t_expect_func` must be updated.
diff --git a/.claude/skills/functional-test/docs/lock-tests.md b/.claude/skills/functional-test/docs/lock-tests.md
new file mode 100644
index 00000000..195de064
--- /dev/null
+++ b/.claude/skills/functional-test/docs/lock-tests.md
@@ -0,0 +1,23 @@
+# Lock Tests (TEST_LOCK)
+
+All lock tests require root (`t_requires_root=true`).
+
+## Test catalog
+
+| Test name | Args exercised | Validates |
+|---|---|---|
+| `lock all options` | `lock --duration 10 --frequency 22 --package --no-summary --format html` + stress-ng | All lock flags combined, successful collection |
+| `lock invalid duration` | `lock --duration 0` | Exit 1 (duration must be > 0) |
+| `lock invalid frequency` | `lock --frequency -1` | Exit 1 (frequency must be > 0) |
+| `lock invalid format` | `lock --format invalid` | Exit 1 (format must be from: all, html, txt) |
+
+## Flags exercised
+
+`--duration`, `--frequency`, `--package`, `--no-summary`, `--format`
+
+## Output verification guidance
+
+- **`lock all options`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains HTML report file. With `--package`, verify raw data package was downloaded.
+- **Negative tests**: Verify exit code is 1. These tests do not set `t_expect_stderr` patterns, so validation is exit-code-only. If a code change adds specific error messages for lock validation, the tests may need `t_expect_stderr` added.
+- **If `--format` options change**: The `lock invalid format` test passes `--format invalid` and expects exit 1. If new format options are added, this test still passes (since `invalid` remains invalid). But if format validation error messages change, verify they still align.
+- **If duration/frequency validation changes** (e.g., allowing 0 duration for indefinite collection): `lock invalid duration` passes `--duration 0` and expects exit 1. If 0 becomes valid, this test must be updated — flag to user.
diff --git a/.claude/skills/functional-test/docs/metrics-tests.md b/.claude/skills/functional-test/docs/metrics-tests.md
new file mode 100644
index 00000000..42853b08
--- /dev/null
+++ b/.claude/skills/functional-test/docs/metrics-tests.md
@@ -0,0 +1,51 @@
+# Metrics Tests (TEST_METRICS)
+
+## Test catalog
+
+| Test name | Runner | Args exercised | Validates | Constraints |
+|---|---|---|---|---|
+| `metrics scope cgroup count` | `run_test` | `metrics --duration 10 --scope cgroup --count 3` + docker stress-ng | stdout: `Metric files`, `metrics.csv`, `summary.csv`; stderr: `collection complete` | Local only (`t_requires_local=true`) |
+| `metrics sigint` | `run_sigint_test` | `metrics` + stress-ng, SIGINT after 15s | Graceful shutdown, log: `Shutting down`, no orphan `perf`/`processwatch` | x86_64 only (`t_requires_arch="x86_64"`) |
+| `metrics duration` | `run_test` | `metrics --duration 10` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+| `metrics granularity cpu` | `run_test` | `metrics --duration 10 --granularity cpu` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+| `metrics granularity socket` | `run_test` | `metrics --duration 10 --granularity socket` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+| `metrics scope process` | `run_test` | `metrics --duration 10 --scope process` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+| `metrics scope process pids` | `run_test` | `metrics --duration 10 --scope process` + stress-ng + `--pids` | Pass with explicit PID targeting | |
+| `metrics txnrate` | `run_test` | `metrics --duration 10 --txnrate 1000` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+| `metrics all raw debug` | `run_test` | `metrics --format csv,json,txt,wide --duration 10 --raw --debug` + stress-ng | stdout: `Metric files`; stderr: `collection complete`; all 4 formats generated; raw events written | |
+| `metrics input txnrate` | `run_test` | `metrics --input <prev> --txnrate 33` | stdout: `Metric files`; reprocesses raw data with new txnrate | Depends on `metrics all raw debug` output |
+| `metrics trim source` | `run_test` | `metrics --duration 30` + stress-ng | stdout: `Metric files`; stderr: `collection complete`; 30s collection for trim input | |
+| `metrics trim` | `run_test` | `metrics trim --input <prev> --start-offset 10 --end-offset 5` | stdout: `Trimmed metrics successfully created:` | `t_skip_target_args=true` (local-only subcommand) |
+| `metrics live` | `run_test` | `metrics --live --duration 10` + stress-ng | stdout: `TS,SKT` (CSV header); stderr: `collecting metrics` | |
+| `metrics workload` | `run_test` | `metrics` + `-- stress-ng --cpu 0 --cpu-load 60 --timeout 10` | stdout: `Metric files`; stderr: `collection complete`; workload-driven duration | |
+| `metrics cpu range` | `run_test` | `metrics --cpus 0-1 --duration 10` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+| `metrics cpu range not zero` | `run_test` | `metrics --cpus 4-7 --duration 10` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+| `metrics list` | `run_test` | `metrics --list` | stdout: `Metrics available` | |
+| `metrics metrics filter` | `run_test` | `metrics --duration 10 --metrics IPC` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+
+## Flags exercised
+
+`--duration`, `--scope`, `--count`, `--granularity`, `--pids`, `--txnrate`, `--format`, `--raw`, `--debug`, `--input`, `--live`, `--cpus`, `--list`, `--metrics`, `--noroot`
+
+Note: `--noroot` is appended automatically when `NO_ROOT=true`.
+
+## Test dependencies
+
+- `metrics input txnrate` depends on the output of `metrics all raw debug` (uses its output directory as `--input`).
+- `metrics trim` depends on the output of `metrics trim source` (uses its output directory as `--input`).
+- These tests must run in order within the category. The test script handles this via sequential `test_num` numbering.
+
+## Output verification guidance
+
+- **Collection tests** (`metrics duration`, `metrics granularity *`, `metrics scope *`, etc.): Verify `stdout.txt` contains `Metric files`. Verify `stderr.txt` contains `collection complete`. Check output directory for expected format files (`.csv`, `.json`, `.txt` depending on `--format`).
+- **Granularity tests**: If granularity logic changes, verify the output CSV/JSON contains data at the expected level (per-CPU rows for `cpu`, per-socket for `socket`, system-wide for `system`).
+- **Scope tests**: If scope logic changes, verify process-level or cgroup-level data appears in output. For `metrics scope process pids`, verify the specific PID's data is present.
+- **Live mode** (`metrics live`): Verify `stdout.txt` starts with `TS,SKT` CSV header. This test validates that live output goes to stdout rather than files.
+- **Workload-driven** (`metrics workload`): Verify collection duration matches the workload's `--timeout 10` rather than an explicit `--duration`.
+- **CPU range** (`metrics cpu range`, `metrics cpu range not zero`): If CPU range parsing changes, verify output contains data only for the specified CPUs.
+- **Trim** (`metrics trim`): Verify `stdout.txt` contains `Trimmed metrics successfully created:`. Verify trimmed output has fewer data points than the source.
+- **Input reprocessing** (`metrics input txnrate`): Verify reprocessed output reflects the new txnrate value (33 vs original 1000).
+- **SIGINT** (`metrics sigint`): Verify `perfspect.log` ends with `Shutting down`. Verify no orphan `perf`/`processwatch` on target.
+- **If `--format` options change**: Check `metrics all raw debug` which exercises `csv,json,txt,wide`. If a format is renamed or removed, this test must be updated.
+- **If `--scope` options change**: Check `metrics scope cgroup count` and `metrics scope process` tests. Update expected behavior accordingly.
+- **If `--list` output format changes**: Check `metrics list` which expects `Metrics available` in stdout. Update pattern if the header text changes.
diff --git a/.claude/skills/functional-test/docs/report-tests.md b/.claude/skills/functional-test/docs/report-tests.md
new file mode 100644
index 00000000..2492bda7
--- /dev/null
+++ b/.claude/skills/functional-test/docs/report-tests.md
@@ -0,0 +1,33 @@
+# Report Tests (TEST_REPORT)
+
+## Test catalog
+
+| Test name | Args exercised | Validates |
+|---|---|---|
+| `report default` | `report` | Default report generation (all categories, default format) |
+| `report cpu isa` | `report --cpu --isa` | Category-specific report with CPU and ISA sections |
+| `report input` | `report --input <prev>` | Reprocessing from `report cpu isa` output |
+| `report format` | `report --format html,xlsx,json,txt` | All 4 output formats generated |
+| `report invalid format` | `report --format invalid` | Exit 1 |
+| `report invalid input` | `report --input invalid` | Exit 1 |
+
+## Flags exercised
+
+`--cpu`, `--isa`, `--input`, `--format`
+
+Note: The test script does not exercise all 29+ category flags (`--system-summary`, `--host`, `--pcie`, `--bios`, `--os`, etc.). Only `--cpu` and `--isa` are explicitly tested. Changes to other category flags are covered only by the `report default` test (which runs with `--all` implicitly).
+
+## Test dependencies
+
+- `report input` depends on the output of `report cpu isa` (uses its output directory as `--input`).
+
+## Output verification guidance
+
+- **`report default`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains report files in the default format. If the change affects any report category or the summary/insights table, check the generated report content.
+- **`report cpu isa`**: Verify output contains only CPU and ISA sections (not the full report). If `--cpu` or `--isa` flag behavior changes, verify the output reflects only those categories.
+- **`report input`**: Verify reprocessing produces output without re-collecting data from the target.
+- **`report format`**: Verify output directory contains files in all 4 formats: `.html`, `.xlsx`, `.json`, `.txt`. If a format is added or removed, this test's `t_args` must be updated.
+- **Negative tests**: Verify exit code is 1. These tests do not set `t_expect_stderr`, so validation is exit-code-only. If error messages change, the tests still pass but you cannot verify the message — consider whether `t_expect_stderr` should be added.
+- **If report table structure changes** (new columns, renamed fields, new tables): The tests do not validate report content beyond exit code. Manually inspect the output HTML/JSON/TXT for the expected changes.
+- **If `--format` options change** (e.g., adding `csv`): `report invalid format` passes `--format invalid` and expects exit 1 — this still passes. But `report format` must be updated to include the new format option.
+- **If category flags change** (new flag or renamed flag): Only `--cpu` and `--isa` are tested directly. Other flags are covered only by `report default`. If a new category is added, it will be included in the default report but not tested in isolation.
diff --git a/.claude/skills/functional-test/docs/telemetry-tests.md b/.claude/skills/functional-test/docs/telemetry-tests.md
new file mode 100644
index 00000000..90db384e
--- /dev/null
+++ b/.claude/skills/functional-test/docs/telemetry-tests.md
@@ -0,0 +1,43 @@
+# Telemetry Tests (TEST_TELEMETRY)
+
+## Test catalog
+
+| Test name | Runner | Args exercised | Validates |
+|---|---|---|---|
+| `telemetry duration` | `run_test` | `telemetry --duration 10` + stress-ng | Basic collection succeeds |
+| `telemetry duration input` | `run_test` | `telemetry --input <prev>` | Reprocessing from `telemetry duration` output |
+| `telemetry all options` | `run_test` | `telemetry --duration 10 --interval 1 --format txt,html --no-summary --instrmix-frequency 2000000` + stress-ng + `--instrmix-pid` | All flags combined, instruction mix with explicit PID |
+| `telemetry cpu` | `run_test` | `telemetry --cpu --duration 10` + stress-ng | CPU category only |
+| `telemetry with cpu input` | `run_test` | `telemetry --input <prev>` | Reprocessing CPU-only data from `telemetry cpu` output |
+| `telemetry invalid duration` | `run_test` | `telemetry --duration -1` | Exit 1 (duration must be 0 or greater) |
+| `telemetry invalid interval` | `run_test` | `telemetry --interval 0` | Exit 1 (interval must be 1 or greater) |
+| `telemetry invalid format` | `run_test` | `telemetry --format invalid` | Exit 1 |
+| `telemetry invalid input` | `run_test` | `telemetry --input invalid` | Exit 1 |
+| `telemetry no output format` | `run_test` | `telemetry --format ""` | Exit 1 (empty format rejected) |
+| `telemetry sigint` | `run_sigint_test` | `telemetry` + stress-ng, SIGINT after 15s | Graceful shutdown, log: `Shutting down`, no orphan `perf`/`processwatch` |
+
+## Flags exercised
+
+`--duration`, `--input`, `--interval`, `--format`, `--no-summary`, `--instrmix-frequency`, `--instrmix-pid`, `--cpu`
+
+Note: The test script does not exercise all category flags (`--ipc`, `--cstate`, `--frequency`, `--power`, `--temperature`, `--memory`, `--network`, `--storage`, `--irqrate`, `--kernel`, `--instrmix`). Only `--cpu` is explicitly tested. Other categories are covered by `telemetry duration` (which runs with `--all` implicitly).
+
+## Test dependencies
+
+- `telemetry duration input` depends on `telemetry duration` output.
+- `telemetry with cpu input` depends on `telemetry cpu` output.
+
+## Output verification guidance
+
+- **`telemetry duration`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains telemetry report files in the default format (html).
+- **`telemetry all options`**: Verify output contains both `.txt` and `.html` files (matching `--format txt,html`). Verify `--no-summary` suppresses the system summary table. Verify instruction mix data is collected (check for instrmix-related content in the output). With `--interval 1`, data points should be at ~1s intervals.
+- **`telemetry cpu`**: Verify output contains only CPU telemetry data (not memory, network, etc.). If `--cpu` flag behavior changes, check the output scope.
+- **Input reprocessing** (`telemetry duration input`, `telemetry with cpu input`): Verify output is regenerated without re-collecting data.
+- **Negative tests**: Verify exit code is 1. These tests do not set `t_expect_stderr`, so validation is exit-code-only.
+- **`telemetry no output format`**: Tests that empty string format is rejected. If format validation changes, verify this edge case is still handled.
+- **SIGINT** (`telemetry sigint`): Verify `perfspect.log` ends with `Shutting down`. Verify no orphan processes on target.
+- **If `--interval` validation changes** (e.g., allowing sub-second intervals): `telemetry invalid interval` passes `--interval 0` and expects exit 1. If 0 becomes valid, update test.
+- **If `--duration` validation changes**: `telemetry invalid duration` passes `--duration -1` and expects exit 1. Negative durations should always be invalid.
+- **If category flags change** (new telemetry category added): Only `--cpu` is tested in isolation. New categories are covered by `telemetry duration` via implicit `--all`. If a category is removed, `telemetry duration` may still pass but output changes — verify.
+- **If `--instrmix-frequency` validation changes**: `telemetry all options` uses `--instrmix-frequency 2000000`. If the minimum changes, verify this value is still valid. The code enforces minimum 100000.
+- **If `--format` options change**: `telemetry invalid format` still passes. But `telemetry all options` uses `--format txt,html` — update if formats are renamed.