Atomic memory operations

## Summary

Add atomic memory operation words for safe concurrent updates from multiple threads.

## Words to implement

### Integer atomics

| Word | Stack effect | MLIR op | Description |
|------|-------------|---------|-------------|
| `ATOMIC+` | `( n addr -- )` | `memref.atomic_rmw addi` | Atomic add (i64) |
| `ATOMIC-MAX` | `( n addr -- )` | `memref.atomic_rmw maxs` | Atomic signed max (i64) |
| `ATOMIC-MIN` | `( n addr -- )` | `memref.atomic_rmw mins` | Atomic signed min (i64) |
| `ATOMIC-AND` | `( n addr -- )` | `memref.atomic_rmw andi` | Atomic bitwise AND |
| `ATOMIC-OR` | `( n addr -- )` | `memref.atomic_rmw ori` | Atomic bitwise OR |
| `ATOMIC-XOR` | `( n addr -- )` | `memref.atomic_rmw xori` | Atomic bitwise XOR |
| `ATOMIC-XCHG` | `( n addr -- old )` | `memref.atomic_rmw assign` | Atomic exchange, returns old value |
| `ATOMIC-CAS` | `( expected new addr -- old )` | `memref.generic_atomic_rmw` | Compare-and-swap, returns old value |

### Float atomics

| Word | Stack effect | MLIR op | Description |
|------|-------------|---------|-------------|
| `ATOMIC-F+` | `( f addr -- )` | `memref.atomic_rmw addf` | Atomic float add |
| `ATOMIC-FMAX` | `( f addr -- )` | `memref.atomic_rmw maximumf` | Atomic float max |
| `ATOMIC-FMIN` | `( f addr -- )` | `memref.atomic_rmw minimumf` | Atomic float min |

## Motivation

- **Multi-block reductions**: When a reduction spans more than one thread block, the output must be accumulated atomically (e.g., `ATOMIC-F+` for partial sums, `ATOMIC-FMAX` for global max).
- **Histogram / scatter patterns**: Common GPU patterns where multiple threads update the same output location.
- **Lock-free data structures**: `ATOMIC-CAS` enables lock-free algorithms.
- **Flash attention**: Multi-block flash attention variants need atomic output accumulation.

## Implementation notes

- Integer atomics: straightforward mapping to `memref.atomic_rmw` with the appropriate `arith::AtomicRMWKind`.
- Float atomics: values are i64 bit patterns on the stack, so bitcast to f64 before the atomic op. The address computation follows the same pattern as `!` / `F!`.
- `ATOMIC-CAS` is more complex: needs `memref.generic_atomic_rmw` with a comparison body, or lower directly to an LLVM `cmpxchg`.
- NVVM has native support for all of these via PTX `atom.*` instructions.
- Consider starting with just `ATOMIC+` and `ATOMIC-F+` as the minimum viable set.

## Files to modify

1. `include/warpforth/Dialect/Forth/ForthOps.td` — Define new ops
2. `lib/Translation/ForthToMLIR/ForthToMLIR.cpp` — Parse words
3. `lib/Conversion/ForthToMemRef/ForthToMemRef.cpp` — Add conversion patterns
4. `test/Translation/Forth/` — Parser tests
5. `test/Conversion/ForthToMemRef/` — Conversion tests

## Priority

Medium — needed for multi-block reductions and scatter patterns. Not required for single-block kernels.

## Related

- #42 — Float math intrinsics (FMAX/FMIN needed alongside ATOMIC-FMAX/ATOMIC-FMIN)
- #10 — Warp-level primitives (warp reductions reduce the need for atomics)

Word	Stack effect	MLIR op	Description
`ATOMIC+`	`( n addr -- )`	`memref.atomic_rmw addi`	Atomic add (i64)
`ATOMIC-MAX`	`( n addr -- )`	`memref.atomic_rmw maxs`	Atomic signed max (i64)
`ATOMIC-MIN`	`( n addr -- )`	`memref.atomic_rmw mins`	Atomic signed min (i64)
`ATOMIC-AND`	`( n addr -- )`	`memref.atomic_rmw andi`	Atomic bitwise AND
`ATOMIC-OR`	`( n addr -- )`	`memref.atomic_rmw ori`	Atomic bitwise OR
`ATOMIC-XOR`	`( n addr -- )`	`memref.atomic_rmw xori`	Atomic bitwise XOR
`ATOMIC-XCHG`	`( n addr -- old )`	`memref.atomic_rmw assign`	Atomic exchange, returns old value
`ATOMIC-CAS`	`( expected new addr -- old )`	`memref.generic_atomic_rmw`	Compare-and-swap, returns old value

Word	Stack effect	MLIR op	Description
`ATOMIC-F+`	`( f addr -- )`	`memref.atomic_rmw addf`	Atomic float add
`ATOMIC-FMAX`	`( f addr -- )`	`memref.atomic_rmw maximumf`	Atomic float max
`ATOMIC-FMIN`	`( f addr -- )`	`memref.atomic_rmw minimumf`	Atomic float min

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic memory operations #43

Summary

Words to implement

Integer atomics

Float atomics

Motivation

Implementation notes

Files to modify

Priority

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Atomic memory operations #43

Description

Summary

Words to implement

Integer atomics

Float atomics

Motivation

Implementation notes

Files to modify

Priority

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions