Conversation
|
Ok, I've now implemented the quadratic approach, and it is indeed much faster, converging in typically less than 10 iterations for double precision. It is called |
|
Which pullback is this now using? Is that the previous |
|
Also, thanks for the very quick weekend response. |
|
This PR seems to have broken the GPU extensions (both CUDA and AMD). Not sure what is causing this. |
It was using the # replace XY = _sylvester(ÃÃ, -Smat, rhs) with linsolve
Smat⁻¹ = diagm(inv_safe.(S, degeneracy_atol))
f(xy) = ÃÃ * xy * Smat⁻¹ - xy
XY₀ = zeros(scalartype(ÃÃ), size(ÃÃ, 2), size(Smat⁻¹, 1))
XY, info = linsolve(f, -rhs * Smat⁻¹, XY₀, solver_alg) |
@pbrehmer , would it be much work for you to run the same benchmark with For very large matrix size, I agree that Krylov will still be the preferred option, but also the |
I can do that but currently my computational resources on the cluster are blocked by some other simulations so it will take a bit!
Yes indeed. For these benchmarks I explicitly wanted to check the performance of the |
lkdvos
left a comment
There was a problem hiding this comment.
It is probably also worth it to add the remove_f_gauge_dependence functions to the public list, given that we explicitly intend them to be used by TensorKit.
Overall looks like a great PR though, this should hopefully stabilize some of the issues we've been having 🥳
Some remaining question that definitely don't have to be addressed here but is worth bringing up:
- Is it worth it to refactor the
sylvestersolver into its own function, or is this one too hand-crafted for this specific purpose? This might make it more convenient to swap out the Krylov-based solver, but I don't mean to add too much burden to this PR/this package for that either.
I think it is pretty specific; where else do you think we might use this? |
|
I really didnt look at this carefully, as in, is this just a regular Sylvester silver? In that case it shows up in the Clebsch-Gordan equation for finite groups |
No not at all; it uses that one of the two matrices is diagonal (or more generally, easily invertible), and furthermore assumes that the smallest singular value of that matrix is larger than the largest singular value of the other matrix (or more generally: |
|
I am happy to discuss the CG of finite group case though, maybe there are some other tricks that can be used. |
|
Enzyme tests seem to be taken an extreme amount of time / GC / allocations. Not sure if this is a consequence of the changes here, I will have to check. |
I think it's a consequence of how the Enzyme tests work. Basically, they use |
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
|
GPU CI now passing thanks to a Buildkite fix. |
|
Great. Not sure why it is failing on Ubuntu latest; probably timeout? I will already swap |
|
TBH I think the speed of the pullback itself won't change things much, it's entirely compilation time in |
|
Just as a note: this should close #150, right? |
Co-authored-by: Lukas Devos <ldevos98@gmail.com> Co-authored-by: Jutho <Jutho@users.noreply.github.com>
Co-authored-by: Lukas Devos <ldevos98@gmail.com>
Co-authored-by: Lukas Devos <ldevos98@gmail.com>
|
Your PR no longer requires formatting changes. Thank you for your contribution! |
lkdvos
left a comment
There was a problem hiding this comment.
@Jutho I rebased on top of the latest main, and added the gauge dependence removal to the public list. I don't think the performance of the new SVD pullback is required to already merge this, mostly to have the fix of the full SVD out as soon as possible?
|
Ok. It wasn't really my intention to keep |
Co-authored-by: Jutho <Jutho@users.noreply.github.com>
|
One final change could be to replace |
* Update changelog for v0.6.6 * Bump version to v0.6.6

This PR does a number of pullback-related things:
It moves the
remove_x_gauge_dependence!functions from the tests to the main repository, so that they can be reused by higher level packages (TensorKit), and that it is easier to keep them in sync with the functions that test the gauge dependence.Because testing the gauge dependence in the adjoints requires some of the same intermediate calculations as the actual computation of the pullback, and so some computations were performed twice before, there has been some restructuring resulting in a rename
check_x_cotangentstocheck_and_prepare_x_cotangents.The
svd_pullbackimplementation has been fixed to acceptsvd_fullpullbacks, and should also be more robust for pullbacks resulting from ansvd_truncwith arbitraryind.The implementation of the
svd_trunc_pullbackthat computes the pullback without depending on the full SVD ofAhas been changed, because the old one was reportedly (thanks @pbrehmer) very slow for larger matrices, and the Sylvester solver is not available on GPU hardware. Instead, the Sylvester equation is now explicitly solved as a geometric series. This requires another parameter, calledmaxiter(number of terms being kept in the geometric series). It also requires that it are the largest singular values that are being kept, as the convergence rate is essentially determined by the ratio "largest truncated singular value" / "smallest kept singular value". This implementation is only tested in the chainrules tests, where I already noticed that there were random cases where I needed quite a large number of iterations for convergence (100 was not enough, I bumped it to 1000 which does seem enough for the tests). This could be useful in combination with randomized svd, where because of oversampling you probably do have an explicit value for "largest truncated singular value", such that you can estimate the required number of iterations beforehand.Things to do:
Actually test new
svd_trunc_pullbackon GPU in combination with randomized SVDThe following suggestion could probably dramatically speed up convergence, as it changes from linear convergence to quadratic convergence (which is the typical rate at which dense eigenvalue decomposition / svd converges, basically the error is squared in every iteration and reaches sub-machine precision in a handful of iterations). I will try to implement this instead asap: