Taylor series native exponential implementation#243
Conversation
|
Your PR no longer requires formatting changes. Thank you for your contribution! |
84e07c2 to
08d3a71
Compare
|
@Jutho I hope you don't mind but I took some time to rebase this PR now that @sanderdemeyer's PR is finished. Benchmark:
|
| n | LA (µs) | Taylor (µs) | Taylor no-balance (µs) | Tayl/LA | nobal/LA | relerr |
|---|---|---|---|---|---|---|
| 8 | 3.9 | 5.9 | 5.6 | 1.51 | 1.43 | 1.3e-15 |
| 16 | 14.6 | 13.1 | 11.8 | 0.90 | 0.81 | 5.4e-15 |
| 32 | 57.8 | 30.1 | 26.7 | 0.52 | 0.46 | 1.2e-14 |
| 64 | 263.1 | 136.1 | 134.6 | 0.52 | 0.51 | 2.1e-14 |
| 128 | 2108.7 | 1219.7 | 1188.3 | 0.58 | 0.56 | 3.5e-14 |
| 256 | 6263.0 | 4161.5 | 4050.0 | 0.66 | 0.65 | 8.4e-14 |
ComplexF64
| n | LA (µs) | Taylor (µs) | Taylor no-balance (µs) | Tayl/LA | nobal/LA | relerr |
|---|---|---|---|---|---|---|
| 8 | 8.1 | 10.0 | 9.0 | 1.23 | 1.11 | 2.2e-15 |
| 16 | 31.1 | 36.8 | 30.0 | 1.19 | 0.97 | 1.9e-15 |
| 32 | 152.1 | 133.3 | 119.1 | 0.88 | 0.78 | 5.1e-15 |
| 64 | 943.1 | 801.2 | 719.9 | 0.85 | 0.76 | 2.5e-14 |
| 128 | 3456.6 | 2833.4 | 2591.5 | 0.82 | 0.75 | 3.0e-14 |
| 256 | 13064.8 | 11885.0 | 10696.6 | 0.91 | 0.82 | 7.4e-14 |
Taylor is faster than LinearAlgebra.exp for n ≥ 16 (Float64) and competitive-to-faster for ComplexF64, only paying a penalty at the smallest sizes. Balancing costs ~10–15% on these well-scaled random matrices; it is kept on by default for robustness on badly-scaled inputs.
Arbitrary precision (256-bit)
LinearAlgebra.exp has no method for Matrix{BigFloat}.
Reported below are Taylor timings and the self-consistency residual ‖exp(A)·exp(−A) − I‖₁.
BigFloat
| n | Taylor (ms) | no-balance (ms) | ‖E·E(−A) − I‖₁ |
|---|---|---|---|
| 8 | 1.46 | 1.38 | 1.9e-74 |
| 16 | 9.67 | 9.62 | 2.7e-73 |
| 32 | 75.63 | 76.57 | 3.0e-71 |
| 64 | 769.48 | 769.52 | 1.5e-68 |
Complex{BigFloat}
| n | Taylor (ms) | no-balance (ms) | ‖E·E(−A) − I‖₁ |
|---|---|---|---|
| 8 | 5.95 | 6.00 | 4.5e-74 |
| 16 | 44.53 | 43.26 | 2.9e-72 |
| 32 | 410.79 | 418.93 | 6.3e-71 |
| 64 | 3449.70 | 3432.66 | 2.0e-69 |
exponential implementation
|
I guess for the |
Codecov Report❌ Patch coverage is
... and 23 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Register a pure-Julia scaling-and-squaring Taylor algorithm as `MatrixFunctionViaTaylor` and fit it to the exponential interface: the `(τ, A)` forms reuse the existing generic method, and it becomes the default algorithm for all dense matrices (`DiagonalAlgorithm` still handles `Diagonal`). Being LAPACK-free, it also covers arbitrary precision, where `LinearAlgebra.exp` has no method. The truncation order and number of squarings are chosen to minimize the Paterson-Stockmeyer matrix-multiplication count under the Taylor remainder bound, the polynomial is evaluated with incrementally built coefficients, and squaring reuses a ping-pong buffer. `balance!` is rewritten as a clean, type-generic Parlett-Reinsch scaling and wired in as an optional preprocessing step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
a7f2e42 to
0d07fe9
Compare
|
It seems like for BigFloat the decompositions are actually faster in some cases and slower in other, in particular for hermitian inputs. I'm not sure what should be the conclusion of what algorithm to choose by default, or what is causing this since that is slightly unexpected to me, but I guess the matmuls just aren't as efficient, so the more complex work needed for eigensolvers is less bad. |
357d1e6 to
b111776
Compare
|
That is not cool. Is that really due to the slow matrix multiplication. Would that be overhead due to the allocation of the different bigfloats. Could you try the same with something like |
|
After looking at it a bit more, one thing I noticed is that the Taylor series is truncated such that the error is To be completely fair, I would assume there is some equivalent that is true for the eigen decomposition, but I know too little of that implementation to know how that scales. I reran the benchmarks with DoubleFloat, the story is slightly different but it looks like it is hard to beat the hermitian diagonalization... |
|
I investigated a little further, and the main thing that I can come up with is to try and get a tighter bound on the operator norm estimator. Currently, we are just using There are some ways of improving on this, but things get complicated quite quickly, these methods are not free, and the actual payoff is not that large, so I think I might vote to just keep things the way they are right now. As this already is faster than LinearAlgebra, and we don't immediately have a need for hyper-efficient dense exponentials, I think this should be fine :). |
|
I don't recognize much of the original implementation anymore. I know that, since you anyway need to compute a minimum number of powers of A, it can be useful to first compute some powers of
This is based on Gelfand's theorem which states that |
This is something I started a while ago, but I need to continue working on it. Basically, for higher precision number types, the best approach for computing the exponential is just a Taylor expansion, combined with scaling and squaring (i.e. exp(A) = (exp(A/2^s))^(2^s)). But which
sin combination with which order ofmof the Taylor expansion is a nontrivial question, that has been studied in the literature, and also involves clever ways of computing higher order polynomials, i.e. polynomials ofAof degreem, thus requiring all powersA^k,k in 0:m, without actually having to dommatrix multiplications.