diff --git a/.claude/skills/functional-test/SKILL.md b/.claude/skills/functional-test/SKILL.md new file mode 100644 index 00000000..ca236008 --- /dev/null +++ b/.claude/skills/functional-test/SKILL.md @@ -0,0 +1,148 @@ +--- +name: functional-test +description: > + Use this skill when running functional tests to validate PerfSpect code changes, + when the user says "run functional tests", "test my changes", "check for regressions", + or when verifying a code change did not break existing functionality. +--- + +> **Skill Loaded:** "Using functional-test skill." + +# Functional Test Runner + +Run targeted PerfSpect functional tests on a remote target to validate code changes. Identify the specific tests affected by a change, run them, and verify output aligns with the change. + +## Test script + +`../tools/perfspect/functional_test.sh` (relative to the perfspect repo root). Verify the file exists before proceeding. + +## Prerequisites + +1. **Built binary.** Run `make` (x86_64) or `make perfspect-aarch64` (ARM64). Binary must be at `./perfspect` (or set `PERFSPECT_DIR`). +2. **Remote target.** User must provide: hostname/IP (`TARGET`), SSH user (`USER_NAME`), private key path (`PRIVATE_KEY_PATH`). Password-less sudo must be configured on the target. +3. **Target dependencies.** `stress-ng` on the target. For flame tests: `java` and `/tmp/primes.java` (copy from `../tools/perfspect/primes.java`). + +## Workflow + +### Step 1 — Analyze the code change + +Run `git diff main...HEAD` (or the appropriate base). Read the diff. Identify: + +- **What changed**: flag names, validation logic, error messages, output formats, collection behavior, report generation, table definitions, script content. +- **Behavioral impact**: Does the change alter a CLI flag? A validation rule? An error message string? An output file format? A collection path? A report table? + +### Step 2 — Identify affected test categories + +Use the code-to-category mapping below to determine which `TEST_*` categories are affected. + +| Changed path | Categories | +|---|---| +| `cmd/config/` | `TEST_CONFIG` | +| `cmd/flamegraph/` | `TEST_FLAME` | +| `cmd/lock/` | `TEST_LOCK` | +| `cmd/metrics/` | `TEST_METRICS` | +| `cmd/report/` | `TEST_REPORT` | +| `cmd/benchmark/` | `TEST_BENCHMARK` | +| `cmd/telemetry/` | `TEST_TELEMETRY` | +| `cmd/root.go` | All — trace the specific change to narrow | +| `internal/app/` | All — trace the specific change to narrow | +| `internal/workflow/` | All reporting commands — trace to narrow | +| `internal/extract/` | `TEST_REPORT`, `TEST_TELEMETRY`, `TEST_METRICS` | +| `internal/target/` | All — affects SSH/local execution | +| `internal/script/` | All — affects script execution | +| `internal/report/` | `TEST_REPORT`, `TEST_BENCHMARK`, `TEST_TELEMETRY`, `TEST_METRICS`, `TEST_FLAME` | +| `internal/table/` | `TEST_REPORT`, `TEST_BENCHMARK`, `TEST_TELEMETRY` | +| `internal/cpus/` | All — CPU detection used everywhere | +| `internal/progress/` | All — progress UI used everywhere | +| `internal/util/` | All — trace the specific change to narrow | +| `main.go`, `go.mod`, `go.sum` | All | +| `scripts/`, `tools/` | All — embedded resources | + +### Step 3 — Identify specific affected tests + +Read the test catalog for each affected category. Load **only** the doc files for affected categories: + +| Category | Test catalog | +|---|---| +| `TEST_CONFIG` | [docs/config-tests.md](docs/config-tests.md) | +| `TEST_FLAME` | [docs/flame-tests.md](docs/flame-tests.md) | +| `TEST_LOCK` | [docs/lock-tests.md](docs/lock-tests.md) | +| `TEST_METRICS` | [docs/metrics-tests.md](docs/metrics-tests.md) | +| `TEST_REPORT` | [docs/report-tests.md](docs/report-tests.md) | +| `TEST_BENCHMARK` | [docs/benchmark-tests.md](docs/benchmark-tests.md) | +| `TEST_TELEMETRY` | [docs/telemetry-tests.md](docs/telemetry-tests.md) | + +Within the loaded catalog, find every test whose behavior intersects with the change using these criteria: + +1. **Flag changes** — Tests that pass the changed flag in `t_args`. +2. **Error message changes** — Tests whose `t_expect_stderr` matches the changed error string. +3. **Output format changes** — Tests that exercise the changed format via `--format` in `t_args`. +4. **Collection behavior changes** — Tests that exercise the changed collection path (scope, granularity, duration, live mode, workload-driven, etc.). +5. **Shared infrastructure changes** — If the change is in shared code (`internal/target/`, `internal/script/`, `internal/workflow/`, `internal/app/`, `cmd/root.go`, `main.go`), trace the change to the specific behavior and find tests that trigger it across categories. Do not blindly run all tests. +6. **stdout/stderr pattern changes** — Tests whose `t_expect_stdout` or `t_expect_stderr` contains text the change modifies. +7. **Custom validation function changes** — Tests with `t_expect_func` that validate output artifacts affected by the change. + +Build a list of specific test names (`t_name` values) and their category. + +### Step 4 — Predict expected test outcomes + +For each identified test, determine whether the code change should: + +- **Not alter the test result** (regression check) — The test must still PASS with the same output patterns. +- **Change the test's expected behavior** — The test's expectations (`t_expect_exit`, `t_expect_stdout`, `t_expect_stderr`, `t_expect_func`) no longer match the new code. Flag this to the user: the test script itself must be updated. Explain what the new expected values must be. +- **Make a previously-skipped test runnable** — If the change adds support for something that was previously guarded. + +### Step 5 — Run the affected test categories + +Disable all categories except those containing affected tests: + +```bash +TARGET= USER_NAME= PRIVATE_KEY_PATH= \ + PERFSPECT_DIR=. \ + TEST_CONFIG=false TEST_FLAME=false TEST_LOCK=false TEST_METRICS=false \ + TEST_REPORT=false TEST_BENCHMARK=false TEST_TELEMETRY=false \ + =true \ + ../tools/perfspect/functional_test.sh -q -v +``` + +Add `NO_ROOT=true` if the remote user does not have password-less sudo. + +### Step 6 — Verify output aligns with the change + +Do not stop at PASS/FAIL. For each affected test: + +1. **Read the test output.** Examine `test/output/-/stdout.txt`, `stderr.txt`, and `perfspect.log`. +2. **Verify the change is reflected.** Follow the output verification guidance in the category's doc file. Examples: + - Error message changed → confirm `stderr.txt` contains the new text. + - New output field added → confirm it appears in `stdout.txt` or generated report files. + - Chart/report generation changed → confirm output HTML/JSON/CSV contains expected new content. + - Bug fix that eliminated ERROR log entries → confirm `perfspect.log` no longer contains `level=ERROR` for the affected path. + - Collection behavior changed → confirm `stderr.txt` shows expected collection messages and `stdout.txt` shows expected output files. +3. **Check for unintended side effects.** Scan output of non-target tests in the same category for unexpected ERRORs or changed output patterns. + +### Step 7 — Report to user + +Provide: +- The list of tests identified as affected and why. +- PASS/FAIL status of each. +- For each affected test: what was verified in the output and whether the change is reflected correctly. +- Any tests whose expectations must be updated in the test script (with the specific `t_expect_*` values that must change). +- Any tests that passed but whose output reveals a concern. + +## Environment variable reference + +| Variable | Default | Purpose | +|---|---|---| +| `PERFSPECT_DIR` | `.` | Path to directory containing the `perfspect` binary | +| `ROOT_OUTPUT_DIR` | `test/output` | Output directory for test artifacts | +| `TARGET` | _(empty)_ | Remote target hostname/IP (empty = local) | +| `USER_NAME` | _(empty)_ | SSH username for remote target | +| `PRIVATE_KEY_PATH` | _(empty)_ | SSH private key path for remote target | +| `NO_ROOT` | `false` | Set to `true` to run without root | +| `TEST_CONFIG` | `true` | Run config tests | +| `TEST_FLAME` | `true` | Run flame tests | +| `TEST_LOCK` | `true` | Run lock tests | +| `TEST_METRICS` | `true` | Run metrics tests | +| `TEST_REPORT` | `true` | Run report tests | +| `TEST_BENCHMARK` | `true` | Run benchmark tests | +| `TEST_TELEMETRY` | `true` | Run telemetry tests | diff --git a/.claude/skills/functional-test/docs/benchmark-tests.md b/.claude/skills/functional-test/docs/benchmark-tests.md new file mode 100644 index 00000000..10ea97fe --- /dev/null +++ b/.claude/skills/functional-test/docs/benchmark-tests.md @@ -0,0 +1,30 @@ +# Benchmark Tests (TEST_BENCHMARK) + +## Test catalog + +| Test name | Args exercised | Validates | +|---|---|---| +| `benchmark default` | `benchmark` | Default benchmark (all benchmarks, default format) | +| `benchmark input` | `benchmark --input ` | Reprocessing from `benchmark default` output | +| `benchmark invalid benchmark` | `benchmark --foo` | Exit 1, unknown flag rejected by cobra | +| `benchmark invalid format` | `benchmark --format invalid` | Exit 1 | + +## Flags exercised + +`--input`, `--format`, unknown flags (cobra validation) + +Note: The test script does not exercise individual benchmark selection flags (`--speed`, `--power`, `--temperature`, `--frequency`, `--memory`, `--cache`, `--storage`) or `--storage-dir`, `--no-summary`. Changes to these flags are covered only by `benchmark default` (which runs with `--all` implicitly). + +## Test dependencies + +- `benchmark input` depends on the output of `benchmark default` (uses its output directory as `--input`). + +## Output verification guidance + +- **`benchmark default`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains benchmark report files. If the change affects benchmark collection, summary table generation, or reference data comparisons, inspect the output report content. +- **`benchmark input`**: Verify reprocessing produces output without re-running benchmarks. +- **`benchmark invalid benchmark`**: Verifies cobra rejects unknown flags. This test is stable unless the flag name `--foo` is added as a real flag (unlikely). +- **`benchmark invalid format`**: Verify exit code is 1. +- **If benchmark selection flags change**: Only `benchmark default` (all benchmarks) is tested. Individual benchmark flags are not exercised. If a benchmark is added/removed/renamed, verify `benchmark default` still passes and its output reflects the change. +- **If `--format` options change**: Same pattern as other commands — `benchmark invalid format` still passes, but `benchmark default` output should be checked for the new format. +- **If `--storage-dir` validation changes**: No test exercises this flag directly. Manual verification required. diff --git a/.claude/skills/functional-test/docs/config-tests.md b/.claude/skills/functional-test/docs/config-tests.md new file mode 100644 index 00000000..305d6c0b --- /dev/null +++ b/.claude/skills/functional-test/docs/config-tests.md @@ -0,0 +1,40 @@ +# Config Tests (TEST_CONFIG) + +All config tests require root (`t_requires_root=true`). + +## Test catalog + +| Test name | Args exercised | Validates | +|---|---|---| +| `config help` | `config --help` | Help text prints `Usage:` | +| `config default` | `config` | No-op prints `No changes requested` and `Configuration` | +| `config gov epb epp` | `config --gov performance --epb 0 --epp 0` | Applies governor/epb/epp, stderr confirms each setting | +| `config disable l2hw prefetcher` | `config --pref-l2hw disable` | Prefetcher disable, stderr confirms | +| `config enable l2hw prefetcher no-summary` | `config --pref-l2hw enable --no-summary` | Prefetcher enable with `--no-summary` suppresses stdout table | +| `config invalid core count` | `config --cores 0` | Exit 1, stderr: `invalid flag value, --cores 0, valid values are` | +| `config invalid llc size` | `config --llc 0` | Exit 1, stderr: `invalid flag value, --llc 0, valid values are` | +| `config invalid core frequency` | `config --core-max .05` | Exit 1, stderr: `invalid flag value, --core-max 0.05, valid values are` | +| `config invalid tdp` | `config --tdp 0` | Exit 1, stderr: `invalid flag value, --tdp 0, valid values are` | +| `config invalid epb` | `config --epb 16` | Exit 1, stderr: `invalid flag value, --epb 16, valid values are` | +| `config invalid epp` | `config --epp 256` | Exit 1, stderr: `invalid flag value, --epp 256, valid values are` | +| `config invalid governor` | `config --gov invalid` | Exit 1, stderr: `invalid flag value, --gov invalid, valid values are` | +| `config invalid elc` | `config --elc invalid` | Exit 1, stderr: `invalid flag value, --elc invalid, valid values are` | +| `config invalid uncore max frequency` | `config --uncore-max .05` | Exit 1, stderr: `invalid flag value, --uncore-max 0.05, valid values are` | +| `config invalid uncore min frequency` | `config --uncore-min .05` | Exit 1, stderr: `invalid flag value, --uncore-min 0.05, valid values are` | +| `config invalid uncore max compute frequency` | `config --uncore-max-compute .05` | Exit 1, stderr: `invalid flag value, --uncore-max-compute 0.05, valid values are` | +| `config invalid uncore min compute frequency` | `config --uncore-min-compute .05` | Exit 1, stderr: `invalid flag value, --uncore-min-compute 0.05, valid values are` | +| `config invalid uncore max io frequency` | `config --uncore-max-io .05` | Exit 1, stderr: `invalid flag value, --uncore-max-io 0.05, valid values are` | +| `config invalid uncore min io frequency` | `config --uncore-min-io .05` | Exit 1, stderr: `invalid flag value, --uncore-min-io 0.05, valid values are` | +| `config invalid l2hw prefetcher` | `config --pref-l2hw invalid` | Exit 1, stderr: `invalid flag value, --pref-l2hw invalid, valid values are` | +| `config invalid c6` | `config --c6 invalid` | Exit 1, stderr: `invalid flag value, --c6 invalid, valid values are` | +| `config invalid c1 demotion` | `config --c1-demotion invalid` | Exit 1, stderr: `invalid flag value, --c1-demotion invalid, valid values are` | + +## Flags exercised + +`--gov`, `--epb`, `--epp`, `--pref-l2hw`, `--no-summary`, `--cores`, `--llc`, `--core-max`, `--tdp`, `--elc`, `--uncore-max`, `--uncore-min`, `--uncore-max-compute`, `--uncore-min-compute`, `--uncore-max-io`, `--uncore-min-io`, `--c6`, `--c1-demotion`, `--help` + +## Output verification guidance + +- **Positive tests** (`config gov epb epp`, `config disable l2hw prefetcher`, etc.): Verify `stderr.txt` contains the `set to ` confirmation messages. Verify `stdout.txt` contains the `Configuration` table when `--no-summary` is not set, and does not contain it when `--no-summary` is set. +- **Negative tests** (all `config invalid *`): Verify `stderr.txt` contains the exact `Error: invalid flag value, -- , valid values are` message. Verify exit code is 1. +- **If a validation range changes** (e.g., `--epb` now accepts 0-20 instead of 0-15): The `config invalid epb` test passes `--epb 16` and expects exit 1. If 16 is now valid, this test must be updated — flag to user with the new boundary value. diff --git a/.claude/skills/functional-test/docs/flame-tests.md b/.claude/skills/functional-test/docs/flame-tests.md new file mode 100644 index 00000000..6629361a --- /dev/null +++ b/.claude/skills/functional-test/docs/flame-tests.md @@ -0,0 +1,43 @@ +# Flame Tests (TEST_FLAME) + +All flame tests require root (`t_requires_root=true`). + +## Test catalog + +| Test name | Runner | Args exercised | Validates | +|---|---|---|---| +| `flame duration java` | `run_test` | `flame --duration 10 --format all` + java workload | JSON output contains `primes.java` in `Flamegraph[0]["Java Stacks"]` | +| `flame duration native` | `run_test` | `flame --duration 10 --format all` + stress-ng | JSON output contains `stress-ng` in `Flamegraph[0]["Native Stacks"]` | +| `flame dual native stacks` | `run_test` | `flame --duration 10 --format all --dual-native-stacks` + stress-ng | Dual stack mode, JSON validates `stress-ng` in Native Stacks | +| `flame all options` | `run_test` | `flame --duration 10 --frequency 10 --format html,json --no-summary --max-depth 20 --perf-event instructions` + java + `--pids` | All flags combined, JSON validates `primes.java` in Java Stacks | +| `flame with input` | `run_test` | `flame --input ` | Reprocessing from raw data produced by `flame all options` | +| `flame invalid format` | `run_test` | `flame --format html,invalid` | Exit 1, stderr: `format options are: all, html, txt, json` | +| `flame invalid duration` | `run_test` | `flame --duration -1` | Exit 1, stderr: `duration must be 0 or greater` | +| `flame invalid frequency` | `run_test` | `flame --frequency 0` | Exit 1, stderr: `frequency must be 1 or greater` | +| `flame sigint native` | `run_sigint_test` | `flame --format all --no-summary` + stress-ng, SIGINT after 15s | Graceful shutdown: log ends with `Shutting down`, `perf` and `processwatch` no longer running, JSON validates `stress-ng` | +| `flame sigint java` | `run_sigint_test` | `flame --format all --no-summary` + java, SIGINT after 15s | Graceful shutdown: log ends with `Shutting down`, JSON validates `primes.java` | + +## Flags exercised + +`--duration`, `--format`, `--frequency`, `--no-summary`, `--max-depth`, `--perf-event`, `--dual-native-stacks`, `--pids`, `--input` + +## Custom validation functions + +Tests `flame duration java`, `flame all options`, `flame sigint java` use: +```bash +jq -r ".["Flamegraph"][0]["Java Stacks"]" "$1"/*_flame.json | grep -q "primes.java" +``` + +Tests `flame duration native`, `flame dual native stacks`, `flame sigint native` use: +```bash +jq -r ".["Flamegraph"][0]["Native Stacks"]" "$1"/*_flame.json | grep -q "stress-ng" +``` + +## Output verification guidance + +- **Collection tests** (`flame duration java`, `flame duration native`, `flame dual native stacks`, `flame all options`): Verify `*_flame.json` exists in the output directory. Parse it with `jq` to confirm the expected stack type contains the workload name. +- **Input reprocessing** (`flame with input`): Verify it regenerates output from previously-collected raw data without re-collecting. +- **Negative tests**: Verify `stderr.txt` contains the exact error message string. Verify exit code is 1. +- **SIGINT tests**: Verify `perfspect.log` last line contains `Shutting down`. Verify no `perf` or `processwatch` processes remain on target. Verify the `t_expect_func` JSON validation still passes (data was collected before shutdown). +- **If `--format` options change**: The `flame invalid format` test expects the error `format options are: all, html, txt, json`. Update the expected string if format options are added or removed. +- **If JSON output structure changes**: The custom validation functions parse `*_flame.json` with specific jq paths. If the JSON schema changes, these tests will fail — flag to user that both code and test `t_expect_func` must be updated. diff --git a/.claude/skills/functional-test/docs/lock-tests.md b/.claude/skills/functional-test/docs/lock-tests.md new file mode 100644 index 00000000..195de064 --- /dev/null +++ b/.claude/skills/functional-test/docs/lock-tests.md @@ -0,0 +1,23 @@ +# Lock Tests (TEST_LOCK) + +All lock tests require root (`t_requires_root=true`). + +## Test catalog + +| Test name | Args exercised | Validates | +|---|---|---| +| `lock all options` | `lock --duration 10 --frequency 22 --package --no-summary --format html` + stress-ng | All lock flags combined, successful collection | +| `lock invalid duration` | `lock --duration 0` | Exit 1 (duration must be > 0) | +| `lock invalid frequency` | `lock --frequency -1` | Exit 1 (frequency must be > 0) | +| `lock invalid format` | `lock --format invalid` | Exit 1 (format must be from: all, html, txt) | + +## Flags exercised + +`--duration`, `--frequency`, `--package`, `--no-summary`, `--format` + +## Output verification guidance + +- **`lock all options`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains HTML report file. With `--package`, verify raw data package was downloaded. +- **Negative tests**: Verify exit code is 1. These tests do not set `t_expect_stderr` patterns, so validation is exit-code-only. If a code change adds specific error messages for lock validation, the tests may need `t_expect_stderr` added. +- **If `--format` options change**: The `lock invalid format` test passes `--format invalid` and expects exit 1. If new format options are added, this test still passes (since `invalid` remains invalid). But if format validation error messages change, verify they still align. +- **If duration/frequency validation changes** (e.g., allowing 0 duration for indefinite collection): `lock invalid duration` passes `--duration 0` and expects exit 1. If 0 becomes valid, this test must be updated — flag to user. diff --git a/.claude/skills/functional-test/docs/metrics-tests.md b/.claude/skills/functional-test/docs/metrics-tests.md new file mode 100644 index 00000000..42853b08 --- /dev/null +++ b/.claude/skills/functional-test/docs/metrics-tests.md @@ -0,0 +1,51 @@ +# Metrics Tests (TEST_METRICS) + +## Test catalog + +| Test name | Runner | Args exercised | Validates | Constraints | +|---|---|---|---|---| +| `metrics scope cgroup count` | `run_test` | `metrics --duration 10 --scope cgroup --count 3` + docker stress-ng | stdout: `Metric files`, `metrics.csv`, `summary.csv`; stderr: `collection complete` | Local only (`t_requires_local=true`) | +| `metrics sigint` | `run_sigint_test` | `metrics` + stress-ng, SIGINT after 15s | Graceful shutdown, log: `Shutting down`, no orphan `perf`/`processwatch` | x86_64 only (`t_requires_arch="x86_64"`) | +| `metrics duration` | `run_test` | `metrics --duration 10` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | | +| `metrics granularity cpu` | `run_test` | `metrics --duration 10 --granularity cpu` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | | +| `metrics granularity socket` | `run_test` | `metrics --duration 10 --granularity socket` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | | +| `metrics scope process` | `run_test` | `metrics --duration 10 --scope process` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | | +| `metrics scope process pids` | `run_test` | `metrics --duration 10 --scope process` + stress-ng + `--pids` | Pass with explicit PID targeting | | +| `metrics txnrate` | `run_test` | `metrics --duration 10 --txnrate 1000` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | | +| `metrics all raw debug` | `run_test` | `metrics --format csv,json,txt,wide --duration 10 --raw --debug` + stress-ng | stdout: `Metric files`; stderr: `collection complete`; all 4 formats generated; raw events written | | +| `metrics input txnrate` | `run_test` | `metrics --input --txnrate 33` | stdout: `Metric files`; reprocesses raw data with new txnrate | Depends on `metrics all raw debug` output | +| `metrics trim source` | `run_test` | `metrics --duration 30` + stress-ng | stdout: `Metric files`; stderr: `collection complete`; 30s collection for trim input | | +| `metrics trim` | `run_test` | `metrics trim --input --start-offset 10 --end-offset 5` | stdout: `Trimmed metrics successfully created:` | `t_skip_target_args=true` (local-only subcommand) | +| `metrics live` | `run_test` | `metrics --live --duration 10` + stress-ng | stdout: `TS,SKT` (CSV header); stderr: `collecting metrics` | | +| `metrics workload` | `run_test` | `metrics` + `-- stress-ng --cpu 0 --cpu-load 60 --timeout 10` | stdout: `Metric files`; stderr: `collection complete`; workload-driven duration | | +| `metrics cpu range` | `run_test` | `metrics --cpus 0-1 --duration 10` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | | +| `metrics cpu range not zero` | `run_test` | `metrics --cpus 4-7 --duration 10` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | | +| `metrics list` | `run_test` | `metrics --list` | stdout: `Metrics available` | | +| `metrics metrics filter` | `run_test` | `metrics --duration 10 --metrics IPC` + stress-ng | stdout: `Metric files`; stderr: `collection complete` | | + +## Flags exercised + +`--duration`, `--scope`, `--count`, `--granularity`, `--pids`, `--txnrate`, `--format`, `--raw`, `--debug`, `--input`, `--live`, `--cpus`, `--list`, `--metrics`, `--noroot` + +Note: `--noroot` is appended automatically when `NO_ROOT=true`. + +## Test dependencies + +- `metrics input txnrate` depends on the output of `metrics all raw debug` (uses its output directory as `--input`). +- `metrics trim` depends on the output of `metrics trim source` (uses its output directory as `--input`). +- These tests must run in order within the category. The test script handles this via sequential `test_num` numbering. + +## Output verification guidance + +- **Collection tests** (`metrics duration`, `metrics granularity *`, `metrics scope *`, etc.): Verify `stdout.txt` contains `Metric files`. Verify `stderr.txt` contains `collection complete`. Check output directory for expected format files (`.csv`, `.json`, `.txt` depending on `--format`). +- **Granularity tests**: If granularity logic changes, verify the output CSV/JSON contains data at the expected level (per-CPU rows for `cpu`, per-socket for `socket`, system-wide for `system`). +- **Scope tests**: If scope logic changes, verify process-level or cgroup-level data appears in output. For `metrics scope process pids`, verify the specific PID's data is present. +- **Live mode** (`metrics live`): Verify `stdout.txt` starts with `TS,SKT` CSV header. This test validates that live output goes to stdout rather than files. +- **Workload-driven** (`metrics workload`): Verify collection duration matches the workload's `--timeout 10` rather than an explicit `--duration`. +- **CPU range** (`metrics cpu range`, `metrics cpu range not zero`): If CPU range parsing changes, verify output contains data only for the specified CPUs. +- **Trim** (`metrics trim`): Verify `stdout.txt` contains `Trimmed metrics successfully created:`. Verify trimmed output has fewer data points than the source. +- **Input reprocessing** (`metrics input txnrate`): Verify reprocessed output reflects the new txnrate value (33 vs original 1000). +- **SIGINT** (`metrics sigint`): Verify `perfspect.log` ends with `Shutting down`. Verify no orphan `perf`/`processwatch` on target. +- **If `--format` options change**: Check `metrics all raw debug` which exercises `csv,json,txt,wide`. If a format is renamed or removed, this test must be updated. +- **If `--scope` options change**: Check `metrics scope cgroup count` and `metrics scope process` tests. Update expected behavior accordingly. +- **If `--list` output format changes**: Check `metrics list` which expects `Metrics available` in stdout. Update pattern if the header text changes. diff --git a/.claude/skills/functional-test/docs/report-tests.md b/.claude/skills/functional-test/docs/report-tests.md new file mode 100644 index 00000000..2492bda7 --- /dev/null +++ b/.claude/skills/functional-test/docs/report-tests.md @@ -0,0 +1,33 @@ +# Report Tests (TEST_REPORT) + +## Test catalog + +| Test name | Args exercised | Validates | +|---|---|---| +| `report default` | `report` | Default report generation (all categories, default format) | +| `report cpu isa` | `report --cpu --isa` | Category-specific report with CPU and ISA sections | +| `report input` | `report --input ` | Reprocessing from `report cpu isa` output | +| `report format` | `report --format html,xlsx,json,txt` | All 4 output formats generated | +| `report invalid format` | `report --format invalid` | Exit 1 | +| `report invalid input` | `report --input invalid` | Exit 1 | + +## Flags exercised + +`--cpu`, `--isa`, `--input`, `--format` + +Note: The test script does not exercise all 29+ category flags (`--system-summary`, `--host`, `--pcie`, `--bios`, `--os`, etc.). Only `--cpu` and `--isa` are explicitly tested. Changes to other category flags are covered only by the `report default` test (which runs with `--all` implicitly). + +## Test dependencies + +- `report input` depends on the output of `report cpu isa` (uses its output directory as `--input`). + +## Output verification guidance + +- **`report default`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains report files in the default format. If the change affects any report category or the summary/insights table, check the generated report content. +- **`report cpu isa`**: Verify output contains only CPU and ISA sections (not the full report). If `--cpu` or `--isa` flag behavior changes, verify the output reflects only those categories. +- **`report input`**: Verify reprocessing produces output without re-collecting data from the target. +- **`report format`**: Verify output directory contains files in all 4 formats: `.html`, `.xlsx`, `.json`, `.txt`. If a format is added or removed, this test's `t_args` must be updated. +- **Negative tests**: Verify exit code is 1. These tests do not set `t_expect_stderr`, so validation is exit-code-only. If error messages change, the tests still pass but you cannot verify the message — consider whether `t_expect_stderr` should be added. +- **If report table structure changes** (new columns, renamed fields, new tables): The tests do not validate report content beyond exit code. Manually inspect the output HTML/JSON/TXT for the expected changes. +- **If `--format` options change** (e.g., adding `csv`): `report invalid format` passes `--format invalid` and expects exit 1 — this still passes. But `report format` must be updated to include the new format option. +- **If category flags change** (new flag or renamed flag): Only `--cpu` and `--isa` are tested directly. Other flags are covered only by `report default`. If a new category is added, it will be included in the default report but not tested in isolation. diff --git a/.claude/skills/functional-test/docs/telemetry-tests.md b/.claude/skills/functional-test/docs/telemetry-tests.md new file mode 100644 index 00000000..90db384e --- /dev/null +++ b/.claude/skills/functional-test/docs/telemetry-tests.md @@ -0,0 +1,43 @@ +# Telemetry Tests (TEST_TELEMETRY) + +## Test catalog + +| Test name | Runner | Args exercised | Validates | +|---|---|---|---| +| `telemetry duration` | `run_test` | `telemetry --duration 10` + stress-ng | Basic collection succeeds | +| `telemetry duration input` | `run_test` | `telemetry --input ` | Reprocessing from `telemetry duration` output | +| `telemetry all options` | `run_test` | `telemetry --duration 10 --interval 1 --format txt,html --no-summary --instrmix-frequency 2000000` + stress-ng + `--instrmix-pid` | All flags combined, instruction mix with explicit PID | +| `telemetry cpu` | `run_test` | `telemetry --cpu --duration 10` + stress-ng | CPU category only | +| `telemetry with cpu input` | `run_test` | `telemetry --input ` | Reprocessing CPU-only data from `telemetry cpu` output | +| `telemetry invalid duration` | `run_test` | `telemetry --duration -1` | Exit 1 (duration must be 0 or greater) | +| `telemetry invalid interval` | `run_test` | `telemetry --interval 0` | Exit 1 (interval must be 1 or greater) | +| `telemetry invalid format` | `run_test` | `telemetry --format invalid` | Exit 1 | +| `telemetry invalid input` | `run_test` | `telemetry --input invalid` | Exit 1 | +| `telemetry no output format` | `run_test` | `telemetry --format ""` | Exit 1 (empty format rejected) | +| `telemetry sigint` | `run_sigint_test` | `telemetry` + stress-ng, SIGINT after 15s | Graceful shutdown, log: `Shutting down`, no orphan `perf`/`processwatch` | + +## Flags exercised + +`--duration`, `--input`, `--interval`, `--format`, `--no-summary`, `--instrmix-frequency`, `--instrmix-pid`, `--cpu` + +Note: The test script does not exercise all category flags (`--ipc`, `--cstate`, `--frequency`, `--power`, `--temperature`, `--memory`, `--network`, `--storage`, `--irqrate`, `--kernel`, `--instrmix`). Only `--cpu` is explicitly tested. Other categories are covered by `telemetry duration` (which runs with `--all` implicitly). + +## Test dependencies + +- `telemetry duration input` depends on `telemetry duration` output. +- `telemetry with cpu input` depends on `telemetry cpu` output. + +## Output verification guidance + +- **`telemetry duration`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains telemetry report files in the default format (html). +- **`telemetry all options`**: Verify output contains both `.txt` and `.html` files (matching `--format txt,html`). Verify `--no-summary` suppresses the system summary table. Verify instruction mix data is collected (check for instrmix-related content in the output). With `--interval 1`, data points should be at ~1s intervals. +- **`telemetry cpu`**: Verify output contains only CPU telemetry data (not memory, network, etc.). If `--cpu` flag behavior changes, check the output scope. +- **Input reprocessing** (`telemetry duration input`, `telemetry with cpu input`): Verify output is regenerated without re-collecting data. +- **Negative tests**: Verify exit code is 1. These tests do not set `t_expect_stderr`, so validation is exit-code-only. +- **`telemetry no output format`**: Tests that empty string format is rejected. If format validation changes, verify this edge case is still handled. +- **SIGINT** (`telemetry sigint`): Verify `perfspect.log` ends with `Shutting down`. Verify no orphan processes on target. +- **If `--interval` validation changes** (e.g., allowing sub-second intervals): `telemetry invalid interval` passes `--interval 0` and expects exit 1. If 0 becomes valid, update test. +- **If `--duration` validation changes**: `telemetry invalid duration` passes `--duration -1` and expects exit 1. Negative durations should always be invalid. +- **If category flags change** (new telemetry category added): Only `--cpu` is tested in isolation. New categories are covered by `telemetry duration` via implicit `--all`. If a category is removed, `telemetry duration` may still pass but output changes — verify. +- **If `--instrmix-frequency` validation changes**: `telemetry all options` uses `--instrmix-frequency 2000000`. If the minimum changes, verify this value is still valid. The code enforces minimum 100000. +- **If `--format` options change**: `telemetry invalid format` still passes. But `telemetry all options` uses `--format txt,html` — update if formats are renamed.