Add Mojo AOT-compiled SIMD take/filter kernels for primitive arrays by joseph-isaacs · Pull Request #7387 · vortex-data/vortex

joseph-isaacs · 2026-04-10T15:40:31Z

Summary

Adds Mojo AOT-compiled SIMD gather kernels for primitive take and filter, with zero runtime dependency and graceful fallback when Mojo isn't installed.

CodSpeed CI Results

"Merging this PR will improve performance by 48.74%" — 11 improved, 0 regressed, 1111 untouched.

Benchmark	BASE	HEAD	Change
`decode_primitives[u8]` (5 variants)	53.5 µs	36.0 µs	+49%
`bench_dict_mask` (4 variants)	1.7 ms	1.5 ms	+10%
`gather_u32_mojo[100K]` vs `gather_u32_avx2[100K]`	N/A	699.8 vs 678.6 µs	within 3%

What's included

kernels/take.mojo — 20 SIMD gather kernels (16 take + 4 filter), 4x unrolled, compiled with --mcpu skylake --mtune skylake for vpgatherqd
build.rs — AOT compiles .mojo → .o → .a, detects Mojo via PATH + ~/.local/bin, passes --target-triple from Cargo's TARGET env, gracefully falls back
mojo.rs — Rust FFI bridge with TakeImpl, dispatches by value byte-width
slice.rs — Mojo SIMD filter for the sparse indices path (<80% selectivity)
take_primitive_simd bench — divan 3-way comparison: scalar vs AVX2 vs Mojo
CI — pip install --user mojo + MOJO_MCPU=skylake for codspeed shard Add CI #2

Key design decisions

Pointers as Int: Mojo 0.26's UnsafePointer has origin/mut params incompatible with @export. Solved with type_of anchor pattern.
Zero runtime dep: nm shows 0 undefined symbols. No Mojo runtime/GC.
--mcpu skylake: Critical for vpgatherqd hardware gather. x86-64-v3 scalarizes the gather into 8 individual loads.
4x unroll: Saturates gather pipeline with independent ops.

⚠️ Known limitation

Mojo compiles for a single target CPU (no runtime dispatch). If the build machine has AVX-512 but the runtime machine only has AVX2, you'd get SIGILL. Currently mitigated by pinning MOJO_MCPU=skylake in CI. For production use, this needs runtime feature detection or multiple compiled objects — same pattern as the existing multiversion crate usage.

Test plan

203 take tests pass with Mojo kernel active
121 filter tests pass with Mojo kernel active
Codspeed shard Add CI #2 builds and runs with Mojo installed
CodSpeed: +49% u8 decode, +10% dict_mask, 0 regressions
Mojo gather within 3% of hand-written AVX2 on u32

https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

Adds a new take kernel implementation that uses Mojo's SIMD gather instructions, compiled ahead-of-time and statically linked into vortex-array. When the Mojo SDK is installed, `build.rs` compiles `kernels/take.mojo` to a native object file with zero external dependencies (no Mojo runtime needed). The kernel auto-selects optimal SIMD width (AVX-512/AVX2/NEON) via Mojo's type system. The dispatch priority is: Mojo > portable_simd > AVX2 > scalar. When Mojo is not installed, build.rs is a no-op and existing Rust kernels are used — zero impact on builds without the Mojo toolchain. Covers all 16 type combinations (4 value widths × 4 index types). All 203 existing take tests pass with the Mojo kernel active. Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

Extends the Mojo AOT kernel with filter-by-indices support. The primitive filter path converts sparse masks (<80% selectivity) into an index array, then gathers values at those positions — identical to the take operation but with usize indices. Four new exported symbols (vortex_filter_{1,2,4,8}byte) are added to the Mojo kernel and wired into filter_slice_by_indices behind cfg(vortex_mojo). Falls back to scalar when Mojo is unavailable. All 121 existing filter tests pass with the Mojo kernel active. Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

Adds `take_primitive_simd` benchmark that calls all three gather implementations through identical `fn(&[T], &[u32]) -> Buffer<T>` signatures on raw buffers. No Vortex Array overhead. Results on AVX2 (65K values, random u32 indices, median): u32, n=100K: scalar=66.9µs, avx2=46.0µs (1.45x), mojo=44.0µs (1.52x) u64, n=100K: scalar=67.1µs, avx2=55.6µs (1.21x), mojo=55.4µs (1.21x) Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

Adds a pip install step for the Mojo SDK in the bench-codspeed job, gated to only run for the vortex-array shard. This enables the Mojo AOT take/filter kernels during codspeed benchmark runs so we get performance tracking for the SIMD gather path. Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

The codspeed benchmark runner crashed with exit code 132 (SIGILL) because `mojo build --emit object` defaults to the native CPU, which may emit AVX-512 or other instructions the CI runner doesn't support. Adds MOJO_MCPU env var (defaults to "native") that build.rs passes as `--mcpu` to the Mojo compiler. CI sets it to "x86-64-v3" (AVX2 baseline) to match the existing RUSTFLAGS="-C target-feature=+avx2". Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

codspeed-hq · 2026-04-10T15:53:48Z

Merging this PR will improve performance by 81.52%

⚡ 34 improved benchmarks
✅ 1088 untouched benchmarks
⏩ 1455 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Simulation	`decode_primitives[u8, (10000, 8)]`	53.5 µs	36.3 µs	+47.41%
⚡	Simulation	`decode_primitives[u8, (10000, 32)]`	53.6 µs	36.5 µs	+46.99%
⚡	Simulation	`decode_primitives[u8, (10000, 2)]`	53.5 µs	36.3 µs	+47.3%
⚡	Simulation	`decode_primitives[u8, (10000, 512)]`	54 µs	36.6 µs	+47.27%
⚡	Simulation	`decode_primitives[u8, (10000, 4)]`	53.5 µs	36.7 µs	+45.75%
⚡	Simulation	`bench_dict_mask[(0.01, 0.9)]`	1.7 ms	1.5 ms	+10.3%
⚡	Simulation	`bench_dict_mask[(0.5, 0.9)]`	1.7 ms	1.5 ms	+10.3%
⚡	Simulation	`bench_dict_mask[(0.1, 0.9)]`	1.7 ms	1.5 ms	+10.3%
⚡	Simulation	`bench_dict_mask[(0.9, 0.9)]`	1.7 ms	1.6 ms	+10.11%
⚡	Simulation	`varbinview_zip_fragmented_mask`	7.1 ms	6.4 ms	+11.75%
⚡	Simulation	`varbinview_zip_block_mask`	3.7 ms	2.9 ms	+27.68%
⚡	Simulation	`decompress[u32, (10000, 16)]`	57.8 µs	44.8 µs	+29.15%
⚡	Simulation	`decompress[u32, (100000, 4)]`	1,030 µs	682.6 µs	+50.89%
⚡	Simulation	`decompress[u32, (100000, 16)]`	432.3 µs	303.6 µs	+42.39%
⚡	Simulation	`decompress[u32, (10000, 4)]`	117.1 µs	82.5 µs	+41.94%
⚡	Simulation	`decompress[u64, (10000, 4)]`	139.8 µs	93.9 µs	+48.92%
⚡	Simulation	`decompress[u64, (1000, 4)]`	28.6 µs	23.9 µs	+19.93%
⚡	Simulation	`decompress[u64, (10000, 16)]`	75.4 µs	64 µs	+17.86%
⚡	Simulation	`decompress[u64, (100000, 16)]`	609.9 µs	496.9 µs	+22.74%
⚡	Simulation	`decompress[u16, (10000, 16)]`	46.3 µs	33.8 µs	+36.96%
...	...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

_{Comparing claude/plan-mojo-simd-kernels-IDywB (5f2a781) with develop (8d9052e)}

1455 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

a10y · 2026-04-10T15:54:30Z

Does Mojo handle runtime dispatch to choose the right kernel for architecture? Or does it just pick one you build the mojo kernels

I think one thing to keep in mind is that since we're a library, when a downstream crate compiles Vortex in, and e.g. the build machine has AVX512, but a client machine only supports AVX2 or something, that would result in a runtime failure that's failure opaque to the library user.

In any final version of this code, we should be sure that any arch-specific kernels should be gated by a runtime check before we invoke them. Similar to what we do for the existing AVX2 kernel.

CodSpeed results showed the Mojo generic gather is ~14% slower than the hand-tuned AVX2 intrinsics for 32-bit types (f32/u32), while being ~50% faster for u8. The AVX2 kernel uses specialized masked gather instructions that outperform Mojo's portable SIMD at x86-64-v3. New dispatch order: portable_simd (nightly) > AVX2 (x86_64) > Mojo (fallback) > scalar Mojo now serves as the SIMD path for: - x86_64 without AVX2 (rare but possible) - Non-x86 platforms (ARM NEON, etc.) Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

Cargo sets TARGET=x86_64-unknown-linux-gnu which confuses Mojo's auto-detection ("unknown target triple"). Explicitly pass it via --target-triple so AOT compilation works in the Cargo build env. Also adds MOJO_MCPU=native default with CI override to x86-64-v3. Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

Two changes that close the gap between Mojo and hand-written AVX2: 1. Target --mcpu=skylake instead of x86-64-v3. The latter causes LLVM to scalarize llvm.masked.gather into 8 individual loads (vpextrq + movl). Skylake enables hardware vpgatherqd which does the gather in a single instruction. 2. 4x loop unrolling in _take(). Issuing 4 independent gather ops per iteration keeps the gather pipeline saturated — critical since vpgatherqd has multi-cycle latency. Before (x86-64-v3, no unroll): 48.1 µs (u32 100K) — 6% behind AVX2 After (skylake, 4x unroll): 44.3 µs (u32 100K) — matches AVX2 Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

Passes --mtune matching --mcpu so LLVM schedules instructions optimally for the target microarchitecture. On Skylake this increases vpgather instruction count from 50 to 75 (LLVM is more willing to use hardware gather with proper scheduling hints). Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

With the optimized kernel (4x unroll + skylake vpgatherqd), Mojo matches hand-written AVX2 intrinsics on x86_64 and also works on ARM/NEON. Restore Mojo as the primary dispatch choice when available, falling back to portable_simd > AVX2 > scalar. This lets codspeed measure the full Mojo-in-production impact across all dict/take benchmarks. Also tested prefetch hints — they hurt at <100K elements (L2 cache already sufficient) and only help marginally at 1M+. Not included. Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

Adds a SIMD broadcast+store kernel for run-end decoding of primitive types. For each run, the value is broadcast to a SIMD register and written 8 elements at a time (vpbroadcastd + vmovdqu on AVX2). Local benchmarks (100K u32 elements): run_len=8: scalar=54µs, mojo=18µs (3.1x) run_len=32: scalar=39µs, mojo=10µs (4.0x) run_len=128: scalar=37µs, mojo=9µs (4.1x) Only activates for the common fast path: u32 ends, non-nullable values, zero offset. Falls through to existing Rust decode otherwise. Adds build.rs to vortex-runend (shares the same Mojo kernel file from vortex-array/kernels/take.mojo), primitive decode benchmark, and CI Mojo install for codspeed shard 6. Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

- Cargo.toml: keep vortex_mojo cfg, accept removal of disable_loom/vortex_nightly - take/mod.rs: keep Mojo dispatch, accept removal of portable_simd, use develop's simplified non-Mojo dispatch structure Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

Adds decode_primitive_u32_scalar alongside decode_primitive_u32 so codspeed tracks both side by side. The scalar variant uses a raw Rust fill loop matching push_n_unchecked behavior. Local results (100K u32): run_len=8: scalar=62µs, mojo=17µs (3.7x) run_len=32: scalar=27µs, mojo=14µs (1.9x) run_len=128: scalar=20µs, mojo=9µs (2.2x) Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

The existing `decompress` benchmark in run_end_compress.rs uses u64 ends, but the Mojo fast path only handled u32 ends. Added u64 ends variants to the Mojo kernel and updated the Rust bridge to dispatch on (ends_ptype, value_byte_width). This means the existing codspeed `decompress[u8/u16/u32/u64]` benchmarks will now exercise the Mojo SIMD broadcast path and show deltas against the develop baseline. Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

- Move runend decode kernel to encodings/runend/kernels/decode.mojo (each crate owns its own kernel file) - Remove take_primitive_simd benchmark — existing codspeed benchmarks (decode_primitives, dict_canonicalize, dict_mask, decompress) already cover all Mojo-accelerated paths - Remove decode_primitive_u32 benchmark — existing decompress benchmark in run_end_compress.rs already exercises the Mojo runend decode path - Remove bench_take_scalar/avx2/mojo helpers and visibility hacks from the crate public API - Revert module visibility changes (compute, take, avx2, mojo back to private) Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

…alar The struct is conditionally unused (only when vortex_mojo is set). #[expect(unused)] fails in CI where Mojo isn't installed for lint. Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

…ple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com> Fix Mojo build on macOS: skip --mcpu=native and --target-triple on Apple targets On macOS, Mojo rejects the Cargo target triple format and --mcpu=native also triggers broken host triple detection. Skip both flags on Apple targets in both build scripts. Signed-off-by: Joe Isaacs <joe@spiraldb.com>

claude added 4 commits April 10, 2026 13:44

joseph-isaacs changed the title ~~Add Mojo AOT-compiled SIMD take kernels for primitive arrays~~ do not merge: Add Mojo AOT-compiled SIMD take kernels for primitive arrays Apr 10, 2026

0ax1 added the do not merge Pull requests that are not intended to merge label Apr 10, 2026

Fix nightly rustfmt: split grouped imports, reorder super:: imports

64fdd36

Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

claude added 5 commits April 10, 2026 15:57

joseph-isaacs changed the title ~~do not merge: Add Mojo AOT-compiled SIMD take kernels for primitive arrays~~ Add Mojo AOT-compiled SIMD take/filter kernels for primitive arrays Apr 10, 2026

claude and others added 7 commits April 10, 2026 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Mojo AOT-compiled SIMD take/filter kernels for primitive arrays#7387

Add Mojo AOT-compiled SIMD take/filter kernels for primitive arrays#7387
joseph-isaacs wants to merge 18 commits intodevelopfrom
claude/plan-mojo-simd-kernels-IDywB

joseph-isaacs commented Apr 10, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

a10y commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

joseph-isaacs commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

CodSpeed CI Results

What's included

Key design decisions

⚠️ Known limitation

Test plan

Uh oh!

codspeed-hq bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 81.52%

Performance Changes

Footnotes

Uh oh!

a10y commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

joseph-isaacs commented Apr 10, 2026 •

edited

Loading

codspeed-hq bot commented Apr 10, 2026 •

edited

Loading

a10y commented Apr 10, 2026 •

edited

Loading