Skip to content

refactor(walltime): hook-based CPU isolation for the walltime runner#427

Open
GuillaumeLagrange wants to merge 1 commit into
mainfrom
cod-3012-cpu-isolation-on-macro-runner-for-sandboxed-processes
Open

refactor(walltime): hook-based CPU isolation for the walltime runner#427
GuillaumeLagrange wants to merge 1 commit into
mainfrom
cod-3012-cpu-isolation-on-macro-runner-for-sandboxed-processes

Conversation

@GuillaumeLagrange

@GuillaumeLagrange GuillaumeLagrange commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Replace the runner's built-in CPU-isolation mechanisms with a single, machine-driven one.

The runner previously hard-coded systemd-run --scope and, in a first iteration, a CODSPEED_ISOLATION=CGROUP:<dir> mode with per-spawn cgroup-dir plumbing — with the work split across a HookScriptsGuard (which ran the pre/post-bench hooks) and isolation.rs (which did the wrapping).

Now a single Isolation type owns the whole lifecycle: resolve() runs the pre-bench hook, wrap_bench() pins the benchmark leaf, and Drop runs post-bench. Cpuset logic lives on the machine behind three hooks (codspeed-{pre,wrap,post}-bench); the runner only invokes them and is otherwise oblivious to how cores are attributed.

Discovery is by hook presence:

  • an executable codspeed-wrap-bench selects the hook path — unprivileged, and the benchmark stays a descendant of the profiler so it records without sudo;
  • its absence falls back to the existing systemd-run --scope --slice=codspeed.slice, so hosts without the hook keep working unchanged.

requires_sudo is now true only for that fallback, so the profiler wrapping is effectively unchanged. The pre-bench hook is invoked with the runner's PID so the machine can place the runner (and the profiler it spawns) onto the system cores — the runner makes no cgroup writes of its own.

Refs COD-3012

@greptile-apps

greptile-apps Bot commented Jun 29, 2026

Copy link
Copy Markdown

Greptile Summary

This PR moves walltime CPU isolation into a single hook-aware lifecycle. The main changes are:

  • Adds an Isolation type for setup, benchmark wrapping, and teardown.
  • Uses machine hooks when codspeed-wrap-bench is installed.
  • Keeps the systemd scope path as the fallback.
  • Threads requires_sudo through the executor and profilers.

Confidence Score: 4/5

This is close, but these isolation paths should be fixed before merging.

  • CODSPEED_ISOLATION=true can still fall back to no isolation on hosts without hooks or passwordless sudo.
  • Hook mode can still proceed after pre-bench setup fails, which can produce incorrect walltime results.
  • The profiler sudo split and wrapper ordering otherwise look consistent with the intended design.

src/executor/wall_time/isolation.rs

Important Files Changed

Filename Overview
src/executor/wall_time/isolation.rs Adds the isolation mode resolver and hook lifecycle, but the resolver still mishandles the positive isolation override and failed pre-hook setup.
src/executor/wall_time/executor.rs Switches the run path to use the resolved isolation object and forwards the sudo requirement to profilers.
src/executor/wall_time/profiler/mod.rs Renames the profiler wrapping contract around whether sudo is required.
src/executor/wall_time/profiler/perf/mod.rs Uses the new sudo requirement flag when wrapping perf.
src/executor/wall_time/profiler/samply/mod.rs Uses the new sudo requirement flag when wrapping samply.

Reviews (5): Last reviewed commit: "refactor(walltime): hook-based CPU isola..." | Re-trigger Greptile

Comment thread src/executor/wall_time/isolation.rs Outdated
Comment thread src/executor/wall_time/isolation.rs Outdated
@codspeed-hq

codspeed-hq Bot commented Jun 29, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 7 untouched benchmarks


Comparing cod-3012-cpu-isolation-on-macro-runner-for-sandboxed-processes (6151305) with main (7ce2c98)

Open in CodSpeed

@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-3012-cpu-isolation-on-macro-runner-for-sandboxed-processes branch from a15e7ed to 40f9325 Compare June 29, 2026 15:51
Comment thread src/executor/wall_time/isolation.rs Outdated
IsolationMode::Cgroup { scope_dir } => {
// Move the runner (and the profiler it later spawns) onto the
// system cores before wrapping the benchmark for the bench cores.
place_runner_in_support(scope_dir)?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Support leaf race In cgroup mode, this moves the runner into <scope>/support/cgroup.procs before the code waits for any leaf to exist. If the privileged setup creates support and bench asynchronously, this write can fail with a missing support/cgroup.procs path even though the cgroup layout would be ready shortly after. The run then exits before the benchmark starts, so the support leaf needs the same readiness handling as the bench leaf.

@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-3012-cpu-isolation-on-macro-runner-for-sandboxed-processes branch from 40f9325 to d2f2793 Compare June 29, 2026 16:24
Comment on lines +129 to +136
Ok(o) if o.status.success() => {}
Ok(o) => debug!(
"{hook} exited {}: {}",
o.status,
String::from_utf8_lossy(&o.stderr)
),
Err(e) => debug!("failed to run {hook}: {e}"),
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Hook setup can fail silently

This branch logs a failing codspeed-pre-bench hook and still selects hook isolation. When the pre hook fails because the support cgroup is not ready, permissions are wrong, or the hook install is partial, the runner continues with codspeed-wrap-bench and runs the profiler without sudo. The walltime run can then complete with the runner or profiler outside the intended support leaf, which produces incorrectly isolated benchmark results instead of a clear failure or fallback.

@GuillaumeLagrange GuillaumeLagrange changed the title feat: add cgroup CPU isolation for sandboxed walltime runs refactor(walltime): hook-based CPU isolation for the walltime runner Jul 1, 2026
Replace the runner's built-in CPU-isolation mechanisms with a single,
machine-driven one. The runner previously hard-coded `systemd-run --scope` and a
`CODSPEED_ISOLATION=CGROUP:<dir>` mode with per-spawn cgroup-dir plumbing, split
across a `HookScriptsGuard` (which ran the pre/post-bench hooks) and
`isolation.rs` (which did the wrapping).

Now a single `Isolation` type owns the whole lifecycle: `resolve()` runs the
pre-bench hook, `wrap_bench()` pins the benchmark leaf, and `Drop` runs
post-bench. Cpuset logic lives on the machine behind three hooks
(`codspeed-{pre,wrap,post}-bench`); the runner only invokes them and is otherwise
oblivious to how cores are attributed. Discovery is by hook presence:

- an executable `codspeed-wrap-bench` selects the hook path — unprivileged, and
  the benchmark stays a descendant of the profiler so it records without sudo;
- its absence falls back to `systemd-run --scope --slice=codspeed.slice`, so
  hosts without the hook keep working unchanged.

The pre-bench hook is invoked with the runner's PID so the machine places the
runner (and the profiler it spawns) onto the system cores; the runner makes no
cgroup writes of its own. The profiler's `wrap_command` flag is renamed
`isolate` -> `requires_sudo`, now true only for the systemd fallback.

Refs COD-3012
Co-Authored-By: Claude <noreply@anthropic.com>
@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-3012-cpu-isolation-on-macro-runner-for-sandboxed-processes branch from 757ec17 to 6151305 Compare July 1, 2026 08:01
Comment on lines +59 to +61
if std::env::var(ISOLATION_ENV).is_ok_and(|v| v == "false") {
return Isolation::None;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 True Override Still Ignored CODSPEED_ISOLATION=false disables isolation again, but CODSPEED_ISOLATION=true still no longer forces it. On a Linux host without codspeed-wrap-bench and without passwordless sudo, this path falls through to Isolation::None, so a job that explicitly requested isolation runs unpinned and produces noisy walltime results. The positive override needs to be handled separately from the auto-detected fallback.

Comment on lines +64 to +66
run_pre_bench();
return Isolation::Hooks;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Pre Hook Failure Continues This branch returns hook isolation even if codspeed-pre-bench fails. run_pre_bench() has no result, and run_hook() only logs failures at debug level, so a machine with an executable wrap hook but a missing or failing pre hook still runs the benchmark through hook mode. In that case the runner and profiler may never be moved to the support cores, and the run can finish with contaminated walltime results instead of failing or falling back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant