From 892993074458feadae98093077a96f9ac3b53dec Mon Sep 17 00:00:00 2001
From: "Harper, Jason M" <jason.m.harper@intel.com>
Date: Fri, 10 Apr 2026 13:14:39 -0700
Subject: [PATCH] feature: add functional-test skill for Claude Code

Adds a Claude Code skill that runs targeted PerfSpect functional tests on
remote targets. The skill analyzes code changes, maps them to affected test
categories, runs only the relevant tests, and verifies output correctness.

Includes test catalogs for all command categories: benchmark, config,
flamegraph, lock, metrics, report, and telemetry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .claude/skills/functional-test/SKILL.md       | 148 ++++++++++++++++++
 .../functional-test/docs/benchmark-tests.md   |  30 ++++
 .../functional-test/docs/config-tests.md      |  40 +++++
 .../functional-test/docs/flame-tests.md       |  43 +++++
 .../skills/functional-test/docs/lock-tests.md |  23 +++
 .../functional-test/docs/metrics-tests.md     |  51 ++++++
 .../functional-test/docs/report-tests.md      |  33 ++++
 .../functional-test/docs/telemetry-tests.md   |  43 +++++
 8 files changed, 411 insertions(+)
 create mode 100644 .claude/skills/functional-test/SKILL.md
 create mode 100644 .claude/skills/functional-test/docs/benchmark-tests.md
 create mode 100644 .claude/skills/functional-test/docs/config-tests.md
 create mode 100644 .claude/skills/functional-test/docs/flame-tests.md
 create mode 100644 .claude/skills/functional-test/docs/lock-tests.md
 create mode 100644 .claude/skills/functional-test/docs/metrics-tests.md
 create mode 100644 .claude/skills/functional-test/docs/report-tests.md
 create mode 100644 .claude/skills/functional-test/docs/telemetry-tests.md

diff --git a/.claude/skills/functional-test/SKILL.md b/.claude/skills/functional-test/SKILL.md
new file mode 100644
index 00000000..ca236008
--- /dev/null
+++ b/.claude/skills/functional-test/SKILL.md
@@ -0,0 +1,148 @@
+---
+name: functional-test
+description: >
+  Use this skill when running functional tests to validate PerfSpect code changes,
+  when the user says "run functional tests", "test my changes", "check for regressions",
+  or when verifying a code change did not break existing functionality.
+---
+
+> **Skill Loaded:** "Using functional-test skill."
+
+# Functional Test Runner
+
+Run targeted PerfSpect functional tests on a remote target to validate code changes. Identify the specific tests affected by a change, run them, and verify output aligns with the change.
+
+## Test script
+
+`../tools/perfspect/functional_test.sh` (relative to the perfspect repo root). Verify the file exists before proceeding.
+
+## Prerequisites
+
+1. **Built binary.** Run `make` (x86_64) or `make perfspect-aarch64` (ARM64). Binary must be at `./perfspect` (or set `PERFSPECT_DIR`).
+2. **Remote target.** User must provide: hostname/IP (`TARGET`), SSH user (`USER_NAME`), private key path (`PRIVATE_KEY_PATH`). Password-less sudo must be configured on the target.
+3. **Target dependencies.** `stress-ng` on the target. For flame tests: `java` and `/tmp/primes.java` (copy from `../tools/perfspect/primes.java`).
+
+## Workflow
+
+### Step 1 — Analyze the code change
+
+Run `git diff main...HEAD` (or the appropriate base). Read the diff. Identify:
+
+- **What changed**: flag names, validation logic, error messages, output formats, collection behavior, report generation, table definitions, script content.
+- **Behavioral impact**: Does the change alter a CLI flag? A validation rule? An error message string? An output file format? A collection path? A report table?
+
+### Step 2 — Identify affected test categories
+
+Use the code-to-category mapping below to determine which `TEST_*` categories are affected.
+
+| Changed path | Categories |
+|---|---|
+| `cmd/config/` | `TEST_CONFIG` |
+| `cmd/flamegraph/` | `TEST_FLAME` |
+| `cmd/lock/` | `TEST_LOCK` |
+| `cmd/metrics/` | `TEST_METRICS` |
+| `cmd/report/` | `TEST_REPORT` |
+| `cmd/benchmark/` | `TEST_BENCHMARK` |
+| `cmd/telemetry/` | `TEST_TELEMETRY` |
+| `cmd/root.go` | All — trace the specific change to narrow |
+| `internal/app/` | All — trace the specific change to narrow |
+| `internal/workflow/` | All reporting commands — trace to narrow |
+| `internal/extract/` | `TEST_REPORT`, `TEST_TELEMETRY`, `TEST_METRICS` |
+| `internal/target/` | All — affects SSH/local execution |
+| `internal/script/` | All — affects script execution |
+| `internal/report/` | `TEST_REPORT`, `TEST_BENCHMARK`, `TEST_TELEMETRY`, `TEST_METRICS`, `TEST_FLAME` |
+| `internal/table/` | `TEST_REPORT`, `TEST_BENCHMARK`, `TEST_TELEMETRY` |
+| `internal/cpus/` | All — CPU detection used everywhere |
+| `internal/progress/` | All — progress UI used everywhere |
+| `internal/util/` | All — trace the specific change to narrow |
+| `main.go`, `go.mod`, `go.sum` | All |
+| `scripts/`, `tools/` | All — embedded resources |
+
+### Step 3 — Identify specific affected tests
+
+Read the test catalog for each affected category. Load **only** the doc files for affected categories:
+
+| Category | Test catalog |
+|---|---|
+| `TEST_CONFIG` | [docs/config-tests.md](docs/config-tests.md) |
+| `TEST_FLAME` | [docs/flame-tests.md](docs/flame-tests.md) |
+| `TEST_LOCK` | [docs/lock-tests.md](docs/lock-tests.md) |
+| `TEST_METRICS` | [docs/metrics-tests.md](docs/metrics-tests.md) |
+| `TEST_REPORT` | [docs/report-tests.md](docs/report-tests.md) |
+| `TEST_BENCHMARK` | [docs/benchmark-tests.md](docs/benchmark-tests.md) |
+| `TEST_TELEMETRY` | [docs/telemetry-tests.md](docs/telemetry-tests.md) |
+
+Within the loaded catalog, find every test whose behavior intersects with the change using these criteria:
+
+1. **Flag changes** — Tests that pass the changed flag in `t_args`.
+2. **Error message changes** — Tests whose `t_expect_stderr` matches the changed error string.
+3. **Output format changes** — Tests that exercise the changed format via `--format` in `t_args`.
+4. **Collection behavior changes** — Tests that exercise the changed collection path (scope, granularity, duration, live mode, workload-driven, etc.).
+5. **Shared infrastructure changes** — If the change is in shared code (`internal/target/`, `internal/script/`, `internal/workflow/`, `internal/app/`, `cmd/root.go`, `main.go`), trace the change to the specific behavior and find tests that trigger it across categories. Do not blindly run all tests.
+6. **stdout/stderr pattern changes** — Tests whose `t_expect_stdout` or `t_expect_stderr` contains text the change modifies.
+7. **Custom validation function changes** — Tests with `t_expect_func` that validate output artifacts affected by the change.
+
+Build a list of specific test names (`t_name` values) and their category.
+
+### Step 4 — Predict expected test outcomes
+
+For each identified test, determine whether the code change should:
+
+- **Not alter the test result** (regression check) — The test must still PASS with the same output patterns.
+- **Change the test's expected behavior** — The test's expectations (`t_expect_exit`, `t_expect_stdout`, `t_expect_stderr`, `t_expect_func`) no longer match the new code. Flag this to the user: the test script itself must be updated. Explain what the new expected values must be.
+- **Make a previously-skipped test runnable** — If the change adds support for something that was previously guarded.
+
+### Step 5 — Run the affected test categories
+
+Disable all categories except those containing affected tests:
+
+```bash
+TARGET=<host> USER_NAME=<user> PRIVATE_KEY_PATH=<key> \
+  PERFSPECT_DIR=. \
+  TEST_CONFIG=false TEST_FLAME=false TEST_LOCK=false TEST_METRICS=false \
+  TEST_REPORT=false TEST_BENCHMARK=false TEST_TELEMETRY=false \
+  <enable affected categories here>=true \
+  ../tools/perfspect/functional_test.sh -q -v
+```
+
+Add `NO_ROOT=true` if the remote user does not have password-less sudo.
+
+### Step 6 — Verify output aligns with the change
+
+Do not stop at PASS/FAIL. For each affected test:
+
+1. **Read the test output.** Examine `test/output/<N>-<test_name>/stdout.txt`, `stderr.txt`, and `perfspect.log`.
+2. **Verify the change is reflected.** Follow the output verification guidance in the category's doc file. Examples:
+   - Error message changed → confirm `stderr.txt` contains the new text.
+   - New output field added → confirm it appears in `stdout.txt` or generated report files.
+   - Chart/report generation changed → confirm output HTML/JSON/CSV contains expected new content.
+   - Bug fix that eliminated ERROR log entries → confirm `perfspect.log` no longer contains `level=ERROR` for the affected path.
+   - Collection behavior changed → confirm `stderr.txt` shows expected collection messages and `stdout.txt` shows expected output files.
+3. **Check for unintended side effects.** Scan output of non-target tests in the same category for unexpected ERRORs or changed output patterns.
+
+### Step 7 — Report to user
+
+Provide:
+- The list of tests identified as affected and why.
+- PASS/FAIL status of each.
+- For each affected test: what was verified in the output and whether the change is reflected correctly.
+- Any tests whose expectations must be updated in the test script (with the specific `t_expect_*` values that must change).
+- Any tests that passed but whose output reveals a concern.
+
+## Environment variable reference
+
+| Variable | Default | Purpose |
+|---|---|---|
+| `PERFSPECT_DIR` | `.` | Path to directory containing the `perfspect` binary |
+| `ROOT_OUTPUT_DIR` | `test/output` | Output directory for test artifacts |
+| `TARGET` | _(empty)_ | Remote target hostname/IP (empty = local) |
+| `USER_NAME` | _(empty)_ | SSH username for remote target |
+| `PRIVATE_KEY_PATH` | _(empty)_ | SSH private key path for remote target |
+| `NO_ROOT` | `false` | Set to `true` to run without root |
+| `TEST_CONFIG` | `true` | Run config tests |
+| `TEST_FLAME` | `true` | Run flame tests |
+| `TEST_LOCK` | `true` | Run lock tests |
+| `TEST_METRICS` | `true` | Run metrics tests |
+| `TEST_REPORT` | `true` | Run report tests |
+| `TEST_BENCHMARK` | `true` | Run benchmark tests |
+| `TEST_TELEMETRY` | `true` | Run telemetry tests |
diff --git a/.claude/skills/functional-test/docs/benchmark-tests.md b/.claude/skills/functional-test/docs/benchmark-tests.md
new file mode 100644
index 00000000..10ea97fe
--- /dev/null
+++ b/.claude/skills/functional-test/docs/benchmark-tests.md
@@ -0,0 +1,30 @@
+# Benchmark Tests (TEST_BENCHMARK)
+
+## Test catalog
+
+| Test name | Args exercised | Validates |
+|---|---|---|
+| `benchmark default` | `benchmark` | Default benchmark (all benchmarks, default format) |
+| `benchmark input` | `benchmark --input <prev>` | Reprocessing from `benchmark default` output |
+| `benchmark invalid benchmark` | `benchmark --foo` | Exit 1, unknown flag rejected by cobra |
+| `benchmark invalid format` | `benchmark --format invalid` | Exit 1 |
+
+## Flags exercised
+
+`--input`, `--format`, unknown flags (cobra validation)
+
+Note: The test script does not exercise individual benchmark selection flags (`--speed`, `--power`, `--temperature`, `--frequency`, `--memory`, `--cache`, `--storage`) or `--storage-dir`, `--no-summary`. Changes to these flags are covered only by `benchmark default` (which runs with `--all` implicitly).
+
+## Test dependencies
+
+- `benchmark input` depends on the output of `benchmark default` (uses its output directory as `--input`).
+
+## Output verification guidance
+
+- **`benchmark default`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains benchmark report files. If the change affects benchmark collection, summary table generation, or reference data comparisons, inspect the output report content.
+- **`benchmark input`**: Verify reprocessing produces output without re-running benchmarks.
+- **`benchmark invalid benchmark`**: Verifies cobra rejects unknown flags. This test is stable unless the flag name `--foo` is added as a real flag (unlikely).
+- **`benchmark invalid format`**: Verify exit code is 1.
+- **If benchmark selection flags change**: Only `benchmark default` (all benchmarks) is tested. Individual benchmark flags are not exercised. If a benchmark is added/removed/renamed, verify `benchmark default` still passes and its output reflects the change.
+- **If `--format` options change**: Same pattern as other commands — `benchmark invalid format` still passes, but `benchmark default` output should be checked for the new format.
+- **If `--storage-dir` validation changes**: No test exercises this flag directly. Manual verification required.
diff --git a/.claude/skills/functional-test/docs/config-tests.md b/.claude/skills/functional-test/docs/config-tests.md
new file mode 100644
index 00000000..305d6c0b
--- /dev/null
+++ b/.claude/skills/functional-test/docs/config-tests.md
@@ -0,0 +1,40 @@
+# Config Tests (TEST_CONFIG)
+
+All config tests require root (`t_requires_root=true`).
+
+## Test catalog
+
+| Test name | Args exercised | Validates |
+|---|---|---|
+| `config help` | `config --help` | Help text prints `Usage:` |
+| `config default` | `config` | No-op prints `No changes requested` and `Configuration` |
+| `config gov epb epp` | `config --gov performance --epb 0 --epp 0` | Applies governor/epb/epp, stderr confirms each setting |
+| `config disable l2hw prefetcher` | `config --pref-l2hw disable` | Prefetcher disable, stderr confirms |
+| `config enable l2hw prefetcher no-summary` | `config --pref-l2hw enable --no-summary` | Prefetcher enable with `--no-summary` suppresses stdout table |
+| `config invalid core count` | `config --cores 0` | Exit 1, stderr: `invalid flag value, --cores 0, valid values are` |
+| `config invalid llc size` | `config --llc 0` | Exit 1, stderr: `invalid flag value, --llc 0, valid values are` |
+| `config invalid core frequency` | `config --core-max .05` | Exit 1, stderr: `invalid flag value, --core-max 0.05, valid values are` |
+| `config invalid tdp` | `config --tdp 0` | Exit 1, stderr: `invalid flag value, --tdp 0, valid values are` |
+| `config invalid epb` | `config --epb 16` | Exit 1, stderr: `invalid flag value, --epb 16, valid values are` |
+| `config invalid epp` | `config --epp 256` | Exit 1, stderr: `invalid flag value, --epp 256, valid values are` |
+| `config invalid governor` | `config --gov invalid` | Exit 1, stderr: `invalid flag value, --gov invalid, valid values are` |
+| `config invalid elc` | `config --elc invalid` | Exit 1, stderr: `invalid flag value, --elc invalid, valid values are` |
+| `config invalid uncore max frequency` | `config --uncore-max .05` | Exit 1, stderr: `invalid flag value, --uncore-max 0.05, valid values are` |
+| `config invalid uncore min frequency` | `config --uncore-min .05` | Exit 1, stderr: `invalid flag value, --uncore-min 0.05, valid values are` |
+| `config invalid uncore max compute frequency` | `config --uncore-max-compute .05` | Exit 1, stderr: `invalid flag value, --uncore-max-compute 0.05, valid values are` |
+| `config invalid uncore min compute frequency` | `config --uncore-min-compute .05` | Exit 1, stderr: `invalid flag value, --uncore-min-compute 0.05, valid values are` |
+| `config invalid uncore max io frequency` | `config --uncore-max-io .05` | Exit 1, stderr: `invalid flag value, --uncore-max-io 0.05, valid values are` |
+| `config invalid uncore min io frequency` | `config --uncore-min-io .05` | Exit 1, stderr: `invalid flag value, --uncore-min-io 0.05, valid values are` |
+| `config invalid l2hw prefetcher` | `config --pref-l2hw invalid` | Exit 1, stderr: `invalid flag value, --pref-l2hw invalid, valid values are` |
+| `config invalid c6` | `config --c6 invalid` | Exit 1, stderr: `invalid flag value, --c6 invalid, valid values are` |
+| `config invalid c1 demotion` | `config --c1-demotion invalid` | Exit 1, stderr: `invalid flag value, --c1-demotion invalid, valid values are` |
+
+## Flags exercised
+
+`--gov`, `--epb`, `--epp`, `--pref-l2hw`, `--no-summary`, `--cores`, `--llc`, `--core-max`, `--tdp`, `--elc`, `--uncore-max`, `--uncore-min`, `--uncore-max-compute`, `--uncore-min-compute`, `--uncore-max-io`, `--uncore-min-io`, `--c6`, `--c1-demotion`, `--help`
+
+## Output verification guidance
+
+- **Positive tests** (`config gov epb epp`, `config disable l2hw prefetcher`, etc.): Verify `stderr.txt` contains the `set <flag> to <value>` confirmation messages. Verify `stdout.txt` contains the `Configuration` table when `--no-summary` is not set, and does not contain it when `--no-summary` is set.
+- **Negative tests** (all `config invalid *`): Verify `stderr.txt` contains the exact `Error: invalid flag value, --<flag> <value>, valid values are` message. Verify exit code is 1.
+- **If a validation range changes** (e.g., `--epb` now accepts 0-20 instead of 0-15): The `config invalid epb` test passes `--epb 16` and expects exit 1. If 16 is now valid, this test must be updated — flag to user with the new boundary value.
diff --git a/.claude/skills/functional-test/docs/flame-tests.md b/.claude/skills/functional-test/docs/flame-tests.md
new file mode 100644
index 00000000..6629361a
--- /dev/null
+++ b/.claude/skills/functional-test/docs/flame-tests.md
@@ -0,0 +1,43 @@
+# Flame Tests (TEST_FLAME)
+
+All flame tests require root (`t_requires_root=true`).
+
+## Test catalog
+
+| Test name | Runner | Args exercised | Validates |
+|---|---|---|---|
+| `flame duration java` | `run_test` | `flame --duration 10 --format all` + java workload | JSON output contains `primes.java` in `Flamegraph[0]["Java Stacks"]` |
+| `flame duration native` | `run_test` | `flame --duration 10 --format all` + stress-ng | JSON output contains `stress-ng` in `Flamegraph[0]["Native Stacks"]` |
+| `flame dual native stacks` | `run_test` | `flame --duration 10 --format all --dual-native-stacks` + stress-ng | Dual stack mode, JSON validates `stress-ng` in Native Stacks |
+| `flame all options` | `run_test` | `flame --duration 10 --frequency 10 --format html,json --no-summary --max-depth 20 --perf-event instructions` + java + `--pids` | All flags combined, JSON validates `primes.java` in Java Stacks |
+| `flame with input` | `run_test` | `flame --input <prev_output>` | Reprocessing from raw data produced by `flame all options` |
+| `flame invalid format` | `run_test` | `flame --format html,invalid` | Exit 1, stderr: `format options are: all, html, txt, json` |
+| `flame invalid duration` | `run_test` | `flame --duration -1` | Exit 1, stderr: `duration must be 0 or greater` |
+| `flame invalid frequency` | `run_test` | `flame --frequency 0` | Exit 1, stderr: `frequency must be 1 or greater` |
+| `flame sigint native` | `run_sigint_test` | `flame --format all --no-summary` + stress-ng, SIGINT after 15s | Graceful shutdown: log ends with `Shutting down`, `perf` and `processwatch` no longer running, JSON validates `stress-ng` |
+| `flame sigint java` | `run_sigint_test` | `flame --format all --no-summary` + java, SIGINT after 15s | Graceful shutdown: log ends with `Shutting down`, JSON validates `primes.java` |
+
+## Flags exercised
+
+`--duration`, `--format`, `--frequency`, `--no-summary`, `--max-depth`, `--perf-event`, `--dual-native-stacks`, `--pids`, `--input`
+
+## Custom validation functions
+
+Tests `flame duration java`, `flame all options`, `flame sigint java` use:
+```bash
+jq -r ".["Flamegraph"][0]["Java Stacks"]" "$1"/*_flame.json | grep -q "primes.java"
+```
+
+Tests `flame duration native`, `flame dual native stacks`, `flame sigint native` use:
+```bash
+jq -r ".["Flamegraph"][0]["Native Stacks"]" "$1"/*_flame.json | grep -q "stress-ng"
+```
+
+## Output verification guidance
+
+- **Collection tests** (`flame duration java`, `flame duration native`, `flame dual native stacks`, `flame all options`): Verify `*_flame.json` exists in the output directory. Parse it with `jq` to confirm the expected stack type contains the workload name.
+- **Input reprocessing** (`flame with input`): Verify it regenerates output from previously-collected raw data without re-collecting.
+- **Negative tests**: Verify `stderr.txt` contains the exact error message string. Verify exit code is 1.
+- **SIGINT tests**: Verify `perfspect.log` last line contains `Shutting down`. Verify no `perf` or `processwatch` processes remain on target. Verify the `t_expect_func` JSON validation still passes (data was collected before shutdown).
+- **If `--format` options change**: The `flame invalid format` test expects the error `format options are: all, html, txt, json`. Update the expected string if format options are added or removed.
+- **If JSON output structure changes**: The custom validation functions parse `*_flame.json` with specific jq paths. If the JSON schema changes, these tests will fail — flag to user that both code and test `t_expect_func` must be updated.
diff --git a/.claude/skills/functional-test/docs/lock-tests.md b/.claude/skills/functional-test/docs/lock-tests.md
new file mode 100644
index 00000000..195de064
--- /dev/null
+++ b/.claude/skills/functional-test/docs/lock-tests.md
@@ -0,0 +1,23 @@
+# Lock Tests (TEST_LOCK)
+
+All lock tests require root (`t_requires_root=true`).
+
+## Test catalog
+
+| Test name | Args exercised | Validates |
+|---|---|---|
+| `lock all options` | `lock --duration 10 --frequency 22 --package --no-summary --format html` + stress-ng | All lock flags combined, successful collection |
+| `lock invalid duration` | `lock --duration 0` | Exit 1 (duration must be > 0) |
+| `lock invalid frequency` | `lock --frequency -1` | Exit 1 (frequency must be > 0) |
+| `lock invalid format` | `lock --format invalid` | Exit 1 (format must be from: all, html, txt) |
+
+## Flags exercised
+
+`--duration`, `--frequency`, `--package`, `--no-summary`, `--format`
+
+## Output verification guidance
+
+- **`lock all options`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains HTML report file. With `--package`, verify raw data package was downloaded.
+- **Negative tests**: Verify exit code is 1. These tests do not set `t_expect_stderr` patterns, so validation is exit-code-only. If a code change adds specific error messages for lock validation, the tests may need `t_expect_stderr` added.
+- **If `--format` options change**: The `lock invalid format` test passes `--format invalid` and expects exit 1. If new format options are added, this test still passes (since `invalid` remains invalid). But if format validation error messages change, verify they still align.
+- **If duration/frequency validation changes** (e.g., allowing 0 duration for indefinite collection): `lock invalid duration` passes `--duration 0` and expects exit 1. If 0 becomes valid, this test must be updated — flag to user.
diff --git a/.claude/skills/functional-test/docs/metrics-tests.md b/.claude/skills/functional-test/docs/metrics-tests.md
new file mode 100644
index 00000000..42853b08
--- /dev/null
+++ b/.claude/skills/functional-test/docs/metrics-tests.md
@@ -0,0 +1,51 @@
+# Metrics Tests (TEST_METRICS)
+
+## Test catalog
+
+| Test name | Runner | Args exercised | Validates | Constraints |
+|---|---|---|---|---|
+| `metrics scope cgroup count` | `run_test` | `metrics --duration 10 --scope cgroup --count 3` + docker stress-ng | stdout: `Metric files`, `metrics.csv`, `summary.csv`; stderr: `collection complete` | Local only (`t_requires_local=true`) |
+| `metrics sigint` | `run_sigint_test` | `metrics` + stress-ng, SIGINT after 15s | Graceful shutdown, log: `Shutting down`, no orphan `perf`/`processwatch` | x86_64 only (`t_requires_arch="x86_64"`) |
+| `metrics duration` | `run_test` | `metrics --duration 10` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+| `metrics granularity cpu` | `run_test` | `metrics --duration 10 --granularity cpu` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+| `metrics granularity socket` | `run_test` | `metrics --duration 10 --granularity socket` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+| `metrics scope process` | `run_test` | `metrics --duration 10 --scope process` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+| `metrics scope process pids` | `run_test` | `metrics --duration 10 --scope process` + stress-ng + `--pids` | Pass with explicit PID targeting | |
+| `metrics txnrate` | `run_test` | `metrics --duration 10 --txnrate 1000` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+| `metrics all raw debug` | `run_test` | `metrics --format csv,json,txt,wide --duration 10 --raw --debug` + stress-ng | stdout: `Metric files`; stderr: `collection complete`; all 4 formats generated; raw events written | |
+| `metrics input txnrate` | `run_test` | `metrics --input <prev> --txnrate 33` | stdout: `Metric files`; reprocesses raw data with new txnrate | Depends on `metrics all raw debug` output |
+| `metrics trim source` | `run_test` | `metrics --duration 30` + stress-ng | stdout: `Metric files`; stderr: `collection complete`; 30s collection for trim input | |
+| `metrics trim` | `run_test` | `metrics trim --input <prev> --start-offset 10 --end-offset 5` | stdout: `Trimmed metrics successfully created:` | `t_skip_target_args=true` (local-only subcommand) |
+| `metrics live` | `run_test` | `metrics --live --duration 10` + stress-ng | stdout: `TS,SKT` (CSV header); stderr: `collecting metrics` | |
+| `metrics workload` | `run_test` | `metrics` + `-- stress-ng --cpu 0 --cpu-load 60 --timeout 10` | stdout: `Metric files`; stderr: `collection complete`; workload-driven duration | |
+| `metrics cpu range` | `run_test` | `metrics --cpus 0-1 --duration 10` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+| `metrics cpu range not zero` | `run_test` | `metrics --cpus 4-7 --duration 10` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+| `metrics list` | `run_test` | `metrics --list` | stdout: `Metrics available` | |
+| `metrics metrics filter` | `run_test` | `metrics --duration 10 --metrics IPC` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | |
+
+## Flags exercised
+
+`--duration`, `--scope`, `--count`, `--granularity`, `--pids`, `--txnrate`, `--format`, `--raw`, `--debug`, `--input`, `--live`, `--cpus`, `--list`, `--metrics`, `--noroot`
+
+Note: `--noroot` is appended automatically when `NO_ROOT=true`.
+
+## Test dependencies
+
+- `metrics input txnrate` depends on the output of `metrics all raw debug` (uses its output directory as `--input`).
+- `metrics trim` depends on the output of `metrics trim source` (uses its output directory as `--input`).
+- These tests must run in order within the category. The test script handles this via sequential `test_num` numbering.
+
+## Output verification guidance
+
+- **Collection tests** (`metrics duration`, `metrics granularity *`, `metrics scope *`, etc.): Verify `stdout.txt` contains `Metric files`. Verify `stderr.txt` contains `collection complete`. Check output directory for expected format files (`.csv`, `.json`, `.txt` depending on `--format`).
+- **Granularity tests**: If granularity logic changes, verify the output CSV/JSON contains data at the expected level (per-CPU rows for `cpu`, per-socket for `socket`, system-wide for `system`).
+- **Scope tests**: If scope logic changes, verify process-level or cgroup-level data appears in output. For `metrics scope process pids`, verify the specific PID's data is present.
+- **Live mode** (`metrics live`): Verify `stdout.txt` starts with `TS,SKT` CSV header. This test validates that live output goes to stdout rather than files.
+- **Workload-driven** (`metrics workload`): Verify collection duration matches the workload's `--timeout 10` rather than an explicit `--duration`.
+- **CPU range** (`metrics cpu range`, `metrics cpu range not zero`): If CPU range parsing changes, verify output contains data only for the specified CPUs.
+- **Trim** (`metrics trim`): Verify `stdout.txt` contains `Trimmed metrics successfully created:`. Verify trimmed output has fewer data points than the source.
+- **Input reprocessing** (`metrics input txnrate`): Verify reprocessed output reflects the new txnrate value (33 vs original 1000).
+- **SIGINT** (`metrics sigint`): Verify `perfspect.log` ends with `Shutting down`. Verify no orphan `perf`/`processwatch` on target.
+- **If `--format` options change**: Check `metrics all raw debug` which exercises `csv,json,txt,wide`. If a format is renamed or removed, this test must be updated.
+- **If `--scope` options change**: Check `metrics scope cgroup count` and `metrics scope process` tests. Update expected behavior accordingly.
+- **If `--list` output format changes**: Check `metrics list` which expects `Metrics available` in stdout. Update pattern if the header text changes.
diff --git a/.claude/skills/functional-test/docs/report-tests.md b/.claude/skills/functional-test/docs/report-tests.md
new file mode 100644
index 00000000..2492bda7
--- /dev/null
+++ b/.claude/skills/functional-test/docs/report-tests.md
@@ -0,0 +1,33 @@
+# Report Tests (TEST_REPORT)
+
+## Test catalog
+
+| Test name | Args exercised | Validates |
+|---|---|---|
+| `report default` | `report` | Default report generation (all categories, default format) |
+| `report cpu isa` | `report --cpu --isa` | Category-specific report with CPU and ISA sections |
+| `report input` | `report --input <prev>` | Reprocessing from `report cpu isa` output |
+| `report format` | `report --format html,xlsx,json,txt` | All 4 output formats generated |
+| `report invalid format` | `report --format invalid` | Exit 1 |
+| `report invalid input` | `report --input invalid` | Exit 1 |
+
+## Flags exercised
+
+`--cpu`, `--isa`, `--input`, `--format`
+
+Note: The test script does not exercise all 29+ category flags (`--system-summary`, `--host`, `--pcie`, `--bios`, `--os`, etc.). Only `--cpu` and `--isa` are explicitly tested. Changes to other category flags are covered only by the `report default` test (which runs with `--all` implicitly).
+
+## Test dependencies
+
+- `report input` depends on the output of `report cpu isa` (uses its output directory as `--input`).
+
+## Output verification guidance
+
+- **`report default`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains report files in the default format. If the change affects any report category or the summary/insights table, check the generated report content.
+- **`report cpu isa`**: Verify output contains only CPU and ISA sections (not the full report). If `--cpu` or `--isa` flag behavior changes, verify the output reflects only those categories.
+- **`report input`**: Verify reprocessing produces output without re-collecting data from the target.
+- **`report format`**: Verify output directory contains files in all 4 formats: `.html`, `.xlsx`, `.json`, `.txt`. If a format is added or removed, this test's `t_args` must be updated.
+- **Negative tests**: Verify exit code is 1. These tests do not set `t_expect_stderr`, so validation is exit-code-only. If error messages change, the tests still pass but you cannot verify the message — consider whether `t_expect_stderr` should be added.
+- **If report table structure changes** (new columns, renamed fields, new tables): The tests do not validate report content beyond exit code. Manually inspect the output HTML/JSON/TXT for the expected changes.
+- **If `--format` options change** (e.g., adding `csv`): `report invalid format` passes `--format invalid` and expects exit 1 — this still passes. But `report format` must be updated to include the new format option.
+- **If category flags change** (new flag or renamed flag): Only `--cpu` and `--isa` are tested directly. Other flags are covered only by `report default`. If a new category is added, it will be included in the default report but not tested in isolation.
diff --git a/.claude/skills/functional-test/docs/telemetry-tests.md b/.claude/skills/functional-test/docs/telemetry-tests.md
new file mode 100644
index 00000000..90db384e
--- /dev/null
+++ b/.claude/skills/functional-test/docs/telemetry-tests.md
@@ -0,0 +1,43 @@
+# Telemetry Tests (TEST_TELEMETRY)
+
+## Test catalog
+
+| Test name | Runner | Args exercised | Validates |
+|---|---|---|---|
+| `telemetry duration` | `run_test` | `telemetry --duration 10` + stress-ng | Basic collection succeeds |
+| `telemetry duration input` | `run_test` | `telemetry --input <prev>` | Reprocessing from `telemetry duration` output |
+| `telemetry all options` | `run_test` | `telemetry --duration 10 --interval 1 --format txt,html --no-summary --instrmix-frequency 2000000` + stress-ng + `--instrmix-pid` | All flags combined, instruction mix with explicit PID |
+| `telemetry cpu` | `run_test` | `telemetry --cpu --duration 10` + stress-ng | CPU category only |
+| `telemetry with cpu input` | `run_test` | `telemetry --input <prev>` | Reprocessing CPU-only data from `telemetry cpu` output |
+| `telemetry invalid duration` | `run_test` | `telemetry --duration -1` | Exit 1 (duration must be 0 or greater) |
+| `telemetry invalid interval` | `run_test` | `telemetry --interval 0` | Exit 1 (interval must be 1 or greater) |
+| `telemetry invalid format` | `run_test` | `telemetry --format invalid` | Exit 1 |
+| `telemetry invalid input` | `run_test` | `telemetry --input invalid` | Exit 1 |
+| `telemetry no output format` | `run_test` | `telemetry --format ""` | Exit 1 (empty format rejected) |
+| `telemetry sigint` | `run_sigint_test` | `telemetry` + stress-ng, SIGINT after 15s | Graceful shutdown, log: `Shutting down`, no orphan `perf`/`processwatch` |
+
+## Flags exercised
+
+`--duration`, `--input`, `--interval`, `--format`, `--no-summary`, `--instrmix-frequency`, `--instrmix-pid`, `--cpu`
+
+Note: The test script does not exercise all category flags (`--ipc`, `--cstate`, `--frequency`, `--power`, `--temperature`, `--memory`, `--network`, `--storage`, `--irqrate`, `--kernel`, `--instrmix`). Only `--cpu` is explicitly tested. Other categories are covered by `telemetry duration` (which runs with `--all` implicitly).
+
+## Test dependencies
+
+- `telemetry duration input` depends on `telemetry duration` output.
+- `telemetry with cpu input` depends on `telemetry cpu` output.
+
+## Output verification guidance
+
+- **`telemetry duration`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains telemetry report files in the default format (html).
+- **`telemetry all options`**: Verify output contains both `.txt` and `.html` files (matching `--format txt,html`). Verify `--no-summary` suppresses the system summary table. Verify instruction mix data is collected (check for instrmix-related content in the output). With `--interval 1`, data points should be at ~1s intervals.
+- **`telemetry cpu`**: Verify output contains only CPU telemetry data (not memory, network, etc.). If `--cpu` flag behavior changes, check the output scope.
+- **Input reprocessing** (`telemetry duration input`, `telemetry with cpu input`): Verify output is regenerated without re-collecting data.
+- **Negative tests**: Verify exit code is 1. These tests do not set `t_expect_stderr`, so validation is exit-code-only.
+- **`telemetry no output format`**: Tests that empty string format is rejected. If format validation changes, verify this edge case is still handled.
+- **SIGINT** (`telemetry sigint`): Verify `perfspect.log` ends with `Shutting down`. Verify no orphan processes on target.
+- **If `--interval` validation changes** (e.g., allowing sub-second intervals): `telemetry invalid interval` passes `--interval 0` and expects exit 1. If 0 becomes valid, update test.
+- **If `--duration` validation changes**: `telemetry invalid duration` passes `--duration -1` and expects exit 1. Negative durations should always be invalid.
+- **If category flags change** (new telemetry category added): Only `--cpu` is tested in isolation. New categories are covered by `telemetry duration` via implicit `--all`. If a category is removed, `telemetry duration` may still pass but output changes — verify.
+- **If `--instrmix-frequency` validation changes**: `telemetry all options` uses `--instrmix-frequency 2000000`. If the minimum changes, verify this value is still valid. The code enforces minimum 100000.
+- **If `--format` options change**: `telemetry invalid format` still passes. But `telemetry all options` uses `--format txt,html` — update if formats are renamed.