Refactor BLAS code#2049
Conversation
|
Some fun facts I didn't know:
|
We can, but we don't usually bother with grads for specialized Ops. I wouldn't expose user-facing GEMM but always start with the canonical Dot + alpha forms for which we have other rewrites/batch-rules/etc... |
|
I'm not thinking about something user-facing for sure. The pullbacks are so simple that it might result in better graphs than trying to start from general forms and rewrite both the forward and backward graph. Not sure. I want to modernize the rewrites next. |
Simple or not is also more code we need to test and maintain. How hard are the BLAS patterns really? Two dots and some scalar multiplications? If we can't handle those we have bigger problems |
Well, empirically... |
I was inspired by the linalg refactor so I wanted to make a pass at BLAS. My objectives here is to make the BLAS code more maintainable. I am doing this by:
blas.pyinto a module.hor.cfilesPoint 3 has two levels we could pursue. Level one is to extract all static code into headers that can be pulled into the string codegen. This is done for all three BLAS functions. The second level is to move all string codegen into a helper function, then only codegen the call to that function. I did this only to GER in the last commit. It's significantly more readable, but it also has the overhead of being a function vs 100% inline. We can discuss.
I think one more step I want to explore in this PR is moving all of the c code and potentially the COps to
link/c/blasinstead oftensor/blas, as a part of this idea that "C should just be another backend" raised in #2006