Skip to content

Autogenerate core_arch modules#196

Open
valadaptive wants to merge 16 commits into
linebender:mainfrom
valadaptive:gen-core-arch
Open

Autogenerate core_arch modules#196
valadaptive wants to merge 16 commits into
linebender:mainfrom
valadaptive:gen-core-arch

Conversation

@valadaptive
Copy link
Copy Markdown
Contributor

I'm not sure where the core_arch modules originally came from. Raph's comment says he copied them from Pulp, which doesn't seem to document how or where they were generated.

This PR replaces them with new autogenerated versions, created by parsing the stdarch crate. I've added it as a submodule under fearless_simd_gen.

I've chosen to make core_arch generation a separate subcommand in the fearless_simd_gen binary; I expect it to be run much less often, and want to allow people to regenerate the rest of the code without having to clone the (relatively heavy) stdarch repo.

I'm not sure how much utility core_arch itself provides right now. The main reason for doing this is that I've embarked on a bit of a yak-shaving expedition:

  • In order to support AVX-512, we need a separate abstraction for mask types (Mask types for AVX-512 #179).
  • In order to add a separate abstraction for mask types, we need an operation that converts them to and from integer types.
  • In order to convert mask types to integer types, we need 64-bit integer types.
  • In order to support 64-bit integer types, we need to make sure we don't accidentally use any intrinsics that are too new (Make sure we're not using any unsupported intrinsics in the various x86 levels #124). Many 64-bit operations are only implemented in AVX-512, even though their naming scheme is identical to operations supported in far older instruction sets.
  • In order to avoid accidentally using unsupported intrinsics, we need a way to import only the supported ones...
  • And to do that, we need to automatically assemble a list of which intrinsics require which CPU features, hence this PR.

@valadaptive valadaptive requested a review from LaurenzV February 4, 2026 13:44
@valadaptive
Copy link
Copy Markdown
Contributor Author

Apologies for the churn; this should be actually ready for review now. The NEON intrinsics in stdarch are a mess to sort through.

@LaurenzV
Copy link
Copy Markdown
Collaborator

LaurenzV commented Feb 5, 2026

Perhaps we should discuss (either Zulip or next office hours) what to do with the core_arch module? It's a lot to review. 😅 So it would be good to clarify that beforehand.

Copy link
Copy Markdown
Contributor

@Shnatsel Shnatsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of testing improvements have been merged into main recently. Merging main into this branch and seeing CI pass would make me much more confident in the correctness of this PR.

Comment on lines +91 to +94
/// Generate the constructor for this architecture.
///
/// x86 and aarch64 require runtime feature detection, so the constructor is unsafe.
/// wasm32 has static feature detection, so the constructor is safe.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aarch64 is guaranteed to have NEON and currently fearless_simd doesn't have any other Aarch64 levels other than baseline NEON. So this could actually be safe for now; making it unsafe is just for future-proofing.

@LaurenzV
Copy link
Copy Markdown
Collaborator

I don’t think any of the core_arch stuff is tested in CI though, no?

@Shnatsel
Copy link
Copy Markdown
Contributor

Only through high-level operations, I believe. So only a small subset.

@LaurenzV
Copy link
Copy Markdown
Collaborator

Aren't we using the intrinsics directly? I don't think we are running anything from core_arch in the tests. 🤔 But I might be wrong.

@Shnatsel
Copy link
Copy Markdown
Contributor

Ah, nevermind then, I must have been mistaken

Comment on lines +153 to +159
if target_features.is_empty()
|| !target_features
.iter()
.any(|feature| feature == &self.module_feature)
{
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This accepts any intrinsic whose target_feature list merely contains the module feature. That pulls extra-feature AArch64 intrinsics into the plain Neon token: for example neon.rs:291 exposes safe vbcaxq_* wrappers even though rustc reports they require neon,sha3, and neon.rs:2537 includes p64 wrappers requiring aes. A Neon proof is not enough to call these safely, so this breaks the safety model. cargo check --target aarch64-unknown-linux-gnu --all-features also reports 137 mismatched target-feature warnings from this. The parser needs to reject intrinsics with unsupported extra features or generate separate feature tokens/modules.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also drops public intrinsics with no #[target_feature]. That removes existing public wrappers such as Sse2::_mm_pause, which stdarch intentionally defines without a target feature because pause is safe as a nop on older CPUs. This is a public API regression from the old core_arch, but may be acceptable depending on the exact list of intrinsics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants