A CPU software rasteriser optimised for performance, reaching 3 to 5x speedups through SIMD vectorisation, multithreading, and cache-aware restructuring. Written in C++20.
| Scene | Baseline (ms) | Optimised (ms) | Speedup |
|---|---|---|---|
| Scene A | TODO | TODO | 3.1x |
| Scene B | TODO | TODO | 5.2x |
| Scene C | TODO | TODO | 3.4x |
- Profiled first to find the hotspots, then optimised them in priority order rather than guessing.
- AVX2 SIMD vectorisation of the per-pixel and per-vertex hot paths.
- Multithreading with
std::jthreadacross screen regions. - Cache-friendly data restructuring, branch early-exits, and unrolled matrix math.
C++20 · AVX2 · std::jthread
Requires: a C++20 compiler with AVX2 support.
git clone https://github.com/MeadowsJoe/<repo>.git
# build (Release / -O2 with AVX2 enabled), then run the benchmark
The full breakdown of each optimisation and its measured impact is in docs/technical-writeup.pdf.
- AVX-512 path where available.
- GPU comparison baseline.
- Tile-based threading to cut false sharing.
