Add per-variant CPU feature gating to bit_transpose benchmarks#8227
Draft
joseph-isaacs wants to merge 2 commits into
Draft
Add per-variant CPU feature gating to bit_transpose benchmarks#8227joseph-isaacs wants to merge 2 commits into
joseph-isaacs wants to merge 2 commits into
Conversation
Each bit_transpose benchmark now declares, inline, which CPU feature sets / architectures it should be measured under in CI, via a small set of macros driven by the compile-time BENCH_VARIANT environment variable: - variant! prefixes the benchmark name with the active variant so the architecture-neutral scalar benchmarks (which run on every leg) do not collide in CodSpeed. - variant_tag! maps a known variant identifier to its string tag; an unknown identifier fails to compile, giving typo-safe tags. - ignore_unless_variant! expands to divan's `ignore` boolean, skipping a benchmark unless we run locally (BENCH_VARIANT=local, the default) or the active variant is one of the listed feature sets. A plain `cargo bench` leaves BENCH_VARIANT at its `local` default (set in .cargo/config.toml) and runs every benchmark once on the host. CI sets BENCH_VARIANT per leg: - the existing bench-codspeed job builds with BENCH_VARIANT=simulation, so the simulation-tagged scalar/bmi2/vbmi variants run there in simulation mode on x86_64+avx2; - a new bench-codspeed-bittranspose job adds walltime legs on real silicon, one per architecture (x86_64 with +avx2, aarch64 with +neon), each building only the bit_transpose bench with its own target features. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
The bit_transpose aarch64 Codspeed leg failed in the system-info step: `grep -m1 "model name" /proc/cpuinfo` returns no match on ARM (no such line; the model is shown by lscpu), and under GitHub's `bash -e` the failing grep aborts the otherwise-diagnostic step. ARM also exposes CPU features as "Features" rather than "flags". Make both cpuinfo greps non-fatal and match the aarch64 "Features" line so the diagnostic step never fails the build on either architecture. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Trial of explicit, per-benchmark CPU feature-set / architecture gating, applied to the
bit_transposebenchmarks. Each benchmark declares inline which variants it should be measured under in CI, while a plaincargo benchignores all gating and runs everything once on the host.Single source of truth = the compile-time
BENCH_VARIANTenv var, which drives both the name prefix (so arch-neutral scalar benches don't collide in CodSpeed) and the gate (run vs skip).Macros (
encodings/fastlanes/benches/shared/mod.rs)variant!("name")— prefixes the bench name with the active variant.variant_tag!(ident)— maps a known variant identifier to its string tag; an unknown identifier fails to compile (typo safety).ignore_unless_variant!(...)— expands to divan'signoreboolean: skip unlessBENCH_VARIANT=local(default) or the active variant is one of the listed feature sets.Per-benchmark tags (
bit_transpose.rs)simulation, x86_64, aarch64simulation, x86_64aarch64CI (
.github/workflows/codspeed.yml)bench-codspeedjob now builds withBENCH_VARIANT=simulation, so the simulation-tagged variants run there in simulation mode (x86_64 + avx2) — nolocal::rename, no duplication.bench-codspeed-bittransposejob: walltime legs on real silicon, one per architecture, each building only--bench bit_transposewith its own target features +BENCH_VARIANT:amd64-medium/ubuntu24-full-x64-pre-v2,-C target-feature=+avx2arm64-medium/ubuntu24-full-arm64-pre-v2,-C target-feature=+neonBehavior
BENCH_VARIANTcargo benchlocallocal::<fn>bench-codspeedsimulationsimulation::<fn>bittransposex86_64 legx86_64x86_64::<fn>bittransposeaarch64 legaarch64aarch64::<fn>Checks
cargo build/cargo clippy --all-features/cargo +nightly fmt --checkon the bench — clean.yamllint --strict -c .yamllint.yamlon the workflow — clean.BENCH_VARIANT=aarch64skips bmi2/vbmi ((ignored)) and runs only the scalar baselines;BENCH_VARIANT=x86_64runs scalar + bmi2 (vbmi shows the pre-existing "no function registered" warning because the dev host lacks AVX512-VBMI).Notes:
x86_64build rather than forcing global+avx512vbmi(which risks SIGILL in surrounding code on non-AVX512 runners); the#[target_feature]intrinsics +has_vbmi()runtime guard handle it safely.x86_64/aarch64) rather thanavx2/neon, because bit_transpose's x86 paths (BMI2/VBMI) are runtime-selected within a single x86 build.https://claude.ai/code/session_01MkzByEJLta4WN2vLqRyvZ1
Generated by Claude Code