Add PTX vector memory intrinsics by ilehtoranta · Pull Request #4 · LostBeard/SpawnDev.ILGPU

ilehtoranta · 2026-05-07T08:50:24Z

Summary

Adds PTX-only vector memory intrinsics for explicit f32 vector load/store code generation.

This introduces:

PTXMemory.LoadF32x2 / StoreF32x2
PTXMemory.LoadF32x4 / StoreF32x4
Float2 and Float4 helper structs
intrinsic registration in the PTX algorithms context
aligned/vectorized ArrayView convenience helpers

The main use case is CUDA kernels that need predictable vector memory instructions instead of relying on backend inference from ordinary scalar or struct access patterns.

Details

The new PTX intrinsics generate explicit PTX vector memory operations:

ld.v2.f32
st.v2.f32
ld.v4.f32
st.v4.f32

For f32x4, ptxas can lower these to 128-bit global memory instructions such as LD.E.128 and ST.E.128 when alignment and addressing are suitable.

This is useful for performance-sensitive kernels that operate on adjacent float values.

Add PTX vector memory intrinsics

8ddce05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add PTX vector memory intrinsics#4

Add PTX vector memory intrinsics#4
ilehtoranta wants to merge 1 commit intoLostBeard:masterfrom
ilehtoranta:codex/ptx-vector-memory-intrinsics

ilehtoranta commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ilehtoranta commented May 7, 2026

Summary

Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant