Add per-variant CPU feature gating to bit_transpose benchmarks #8227
+198
−18
CodSpeed HQ / CodSpeed Performance Analysis
succeeded
Jun 2, 2026 in 0s
Performance Gate Passed
⚠️ Unknown Walltime execution environment detected
Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.
For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.
✅ 1267 untouched benchmarks
🆕 20 new benchmarks
⏩ 8 skipped benchmarks1
Performance Changes
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| 🆕 | Simulation | transpose_scalar |
N/A | 1.8 µs | N/A |
| 🆕 | Simulation | untranspose_scalar |
N/A | 2.4 µs | N/A |
| 🆕 | Simulation | untranspose_scalar_throughput |
N/A | 481.6 µs | N/A |
| 🆕 | Simulation | untranspose_bmi2_throughput |
N/A | 386.1 µs | N/A |
| 🆕 | Simulation | untranspose_bmi2 |
N/A | 1.8 µs | N/A |
| 🆕 | Simulation | transpose_scalar_throughput |
N/A | 266.2 µs | N/A |
| 🆕 | Simulation | transpose_bmi2 |
N/A | 1.8 µs | N/A |
| 🆕 | Simulation | transpose_bmi2_throughput |
N/A | 382.3 µs | N/A |
| 🆕 | WallTime | transpose_bmi2_throughput |
N/A | 27.2 µs | N/A |
| 🆕 | WallTime | transpose_scalar_throughput |
N/A | 41.6 µs | N/A |
| 🆕 | WallTime | transpose_bmi2 |
N/A | 27 ns | N/A |
| 🆕 | WallTime | transpose_scalar |
N/A | 41 ns | N/A |
| 🆕 | WallTime | untranspose_bmi2 |
N/A | 26 ns | N/A |
| 🆕 | WallTime | untranspose_vbmi |
N/A | 24 ns | N/A |
| 🆕 | WallTime | transpose_vbmi_throughput |
N/A | 2.1 µs | N/A |
| 🆕 | WallTime | untranspose_bmi2_throughput |
N/A | 26.7 µs | N/A |
| 🆕 | WallTime | untranspose_vbmi_throughput |
N/A | 24.4 µs | N/A |
| 🆕 | WallTime | untranspose_scalar_throughput |
N/A | 40.6 µs | N/A |
| 🆕 | WallTime | untranspose_scalar |
N/A | 41 ns | N/A |
| 🆕 | WallTime | transpose_vbmi |
N/A | 1 ns | N/A |
Comparing claude/bituntranspose-bench-variants-ArFmf (41dd2ba) with develop (81046d7)
Footnotes
-
8 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Loading