Matrix multiplication systolic array using Piecewise Affine Multiplication written in Verilog
PAM is an approximate floating point multiplication trick. It relies on the fact that the IEEE 754 FP representation when interpreted as an integer approximately encodes log2(|x|). This allows us to do approximate multiplication only using integer addition/subtraction.
PAM can allow us to perform AI inference using far lower circuit complexity and energy usage, without sacrificing too much accuracy.
This project uses that trick to create a MAC (multiply-and-accumulate) unit that has significantly lower complexity than a true MAC unit. That MAC is then organized into a 2D systolic array for matrix multiplication (TPU style).
IEEE 754 encodes real numbers in terms of mantissa and exponent like so:
True multiplication is written as:
For PAM approximate multiplication, add the 2 as integers and subtract bias:
We can see that the two terms are almost identical, and only the
Since the error term only depends on the mantissa values of the operands, the error is small and periodic
Approximation error as we sweep one operand and hold the other constant
Worst case error is 11.9%
If we sweep both operands we can see the overall picture:
Approximation error as we sweep both operands
The mean error is 4.3%, and 62% of pairs have under 5% error
In
The final impact of PAM multiplication might be much lower than 11%, and a full LLM simulation needs to be done to measure it