With NVCC, I can use the flag --use_fast_math to enable algebraic optimisations globally when compiling a file. The only way I found to enable such optimisations in Rust-CUDA is to use the fadd_fast intrinsics (and similar). The typical reasoning for using this approach instead of a global "fast math" flag is that some math functions may rely on strict adherence to IEEE float semantics, so using a global "fast math" flag might silently break these. However, CudaBuilder already has a global ftz flag which may already break such functions. Therefore, perhaps it makes sense to also add a global fast_math flag, given that it's ok to globally opt-in to breaking IEEE anyway via ftz?
I'm happy to help with the implementation of this
With NVCC, I can use the flag
--use_fast_mathto enable algebraic optimisations globally when compiling a file. The only way I found to enable such optimisations in Rust-CUDA is to use thefadd_fastintrinsics (and similar). The typical reasoning for using this approach instead of a global "fast math" flag is that some math functions may rely on strict adherence to IEEE float semantics, so using a global "fast math" flag might silently break these. However,CudaBuilderalready has a globalftzflag which may already break such functions. Therefore, perhaps it makes sense to also add a globalfast_mathflag, given that it's ok to globally opt-in to breaking IEEE anyway viaftz?I'm happy to help with the implementation of this