Skip to content

pullback reorganization#208

Merged
lkdvos merged 13 commits intomainfrom
jh/svdpullback
Apr 23, 2026
Merged

pullback reorganization#208
lkdvos merged 13 commits intomainfrom
jh/svdpullback

Conversation

@Jutho
Copy link
Copy Markdown
Member

@Jutho Jutho commented Apr 17, 2026

This PR does a number of pullback-related things:

  1. It moves the remove_x_gauge_dependence! functions from the tests to the main repository, so that they can be reused by higher level packages (TensorKit), and that it is easier to keep them in sync with the functions that test the gauge dependence.

  2. Because testing the gauge dependence in the adjoints requires some of the same intermediate calculations as the actual computation of the pullback, and so some computations were performed twice before, there has been some restructuring resulting in a rename check_x_cotangents to check_and_prepare_x_cotangents.

  3. The svd_pullback implementation has been fixed to accept svd_full pullbacks, and should also be more robust for pullbacks resulting from an svd_trunc with arbitrary ind.

  4. The implementation of the svd_trunc_pullback that computes the pullback without depending on the full SVD of A has been changed, because the old one was reportedly (thanks @pbrehmer) very slow for larger matrices, and the Sylvester solver is not available on GPU hardware. Instead, the Sylvester equation is now explicitly solved as a geometric series. This requires another parameter, called maxiter (number of terms being kept in the geometric series). It also requires that it are the largest singular values that are being kept, as the convergence rate is essentially determined by the ratio "largest truncated singular value" / "smallest kept singular value". This implementation is only tested in the chainrules tests, where I already noticed that there were random cases where I needed quite a large number of iterations for convergence (100 was not enough, I bumped it to 1000 which does seem enough for the tests). This could be useful in combination with randomized svd, where because of oversampling you probably do have an explicit value for "largest truncated singular value", such that you can estimate the required number of iterations beforehand.

Things to do:

  • Actually test new svd_trunc_pullback on GPU in combination with randomized SVD

  • The following suggestion could probably dramatically speed up convergence, as it changes from linear convergence to quadratic convergence (which is the typical rate at which dense eigenvalue decomposition / svd converges, basically the error is squared in every iteration and reaches sub-machine precision in a handful of iterations). I will try to implement this instead asap:

Screenshot 2026-04-18 at 00 59 26

@Jutho
Copy link
Copy Markdown
Member Author

Jutho commented Apr 18, 2026

Ok, I've now implemented the quadratic approach, and it is indeed much faster, converging in typically less than 10 iterations for double precision. It is called svd_trunc_pullback2!, but should replace svd_trunc_pullback!. I'd be interested in hearing about some real case timings from @pbrehmer 😄 .

@pbrehmer
Copy link
Copy Markdown

pbrehmer commented Apr 19, 2026

Thanks for the upgraded pullbacks! I have some real case timings but using my own implementation where the Sylvester equation is solved using KrylovKit's linsolve. It still might be interesting to see since the scaling should be comparable, I think. My real test case is the fixed-point gradient linear problem of asymmetric CTMRG (without spatial symmetries) for the Fermi-Hubbard model. The benchmarks are run on multiple intermediately-optimized PEPS and then averaged. I plot the timings on a log-log scale as a function of $D^2\chi$ (the matrix that is decomposed is $D^2\chi \times D^2\chi$-dimensional) and normalize them by the CTMRG contraction times (CTMRG runs use a hot start):

image

It's pretty clear that linsolve outperforms the LAPACK sylvester solver already at moderate dimensions and that the scaling is much better. And it's also interesting to see that indeed the sylvester solver approximately has the same scaling as the CTMRG contraction step, i.e. the full SVD.

@Jutho
Copy link
Copy Markdown
Member Author

Jutho commented Apr 19, 2026

Which pullback is this now using? Is that the previous svd_trunc_pullback! (current main branch or latest release)? Or one of the two pullbacks svd_trunc_pullback! or svd_trunc_pullback2! in this branch?

@Jutho
Copy link
Copy Markdown
Member Author

Jutho commented Apr 19, 2026

Also, thanks for the very quick weekend response.

@Jutho
Copy link
Copy Markdown
Member Author

Jutho commented Apr 19, 2026

This PR seems to have broken the GPU extensions (both CUDA and AMD). Not sure what is causing this.

Comment thread src/pullbacks/lq.jl Outdated
Comment thread src/pullbacks/lq.jl Outdated
Comment thread src/pullbacks/lq.jl Outdated
Comment thread src/pullbacks/qr.jl Outdated
Comment thread src/pullbacks/qr.jl Outdated
Comment thread src/pullbacks/qr.jl Outdated
Comment thread src/pullbacks/qr.jl Outdated
Comment thread src/pullbacks/svd.jl Outdated
Comment thread src/pullbacks/svd.jl Outdated
Comment thread src/pullbacks/svd.jl Outdated
@pbrehmer
Copy link
Copy Markdown

Which pullback is this now using? Is that the previous svd_trunc_pullback! (current main branch or latest release)? Or one of the two pullbacks svd_trunc_pullback! or svd_trunc_pullback2! in this branch?

It was using the svd_trunc_pullback! from the latest release (v0.6.5). What I should mention again is that linsolve will only converge this fast if it is preconditioned properly with the inverse singular values. Here are the lines of code which I changed in svd_trunc_pullback! that implement this preconditioning:

# replace XY = _sylvester(ÃÃ, -Smat, rhs) with linsolve
Smat⁻¹ = diagm(inv_safe.(S, degeneracy_atol))
f(xy) = ÃÃ * xy * Smat⁻¹ - xy
XY₀ = zeros(scalartype(ÃÃ), size(ÃÃ, 2), size(Smat⁻¹, 1))
XY, info = linsolve(f, -rhs * Smat⁻¹, XY₀, solver_alg)

Comment thread src/pullbacks/svd.jl Outdated
@Jutho
Copy link
Copy Markdown
Member Author

Jutho commented Apr 20, 2026

It was using the svd_trunc_pullback! from the latest release (v0.6.5). What I should mention again is that linsolve will only converge this fast if it is preconditioned properly with the inverse singular values. Here are the lines of code which I changed in svd_trunc_pullback! that implement this preconditioning:

@pbrehmer , would it be much work for you to run the same benchmark with svd_trunc_pullback! and svd_trunc_pullback2! from this branch (especially the latter) and add the result to that plot?

For very large matrix size, I agree that Krylov will still be the preferred option, but also the svd_trunc itself should be done with Krylov methods then. It is also a fact that if you did compute the full svd, then simply svd_pullback! will still be the fastest. So I think these svd_trunc_pullback(2)! methods really only make sense in combination with something like randomized svd, but still it would be interesting to see the relative performance of each of them.

Comment thread src/pullbacks/lq.jl Outdated
@pbrehmer
Copy link
Copy Markdown

@pbrehmer , would it be much work for you to run the same benchmark with svd_trunc_pullback! and svd_trunc_pullback2! from this branch (especially the latter) and add the result to that plot?

I can do that but currently my computational resources on the cluster are blocked by some other simulations so it will take a bit!

For very large matrix size, I agree that Krylov will still be the preferred option, but also the svd_trunc itself should be done with Krylov methods then. It is also a fact that if you did compute the full svd, then simply svd_pullback! will still be the fastest. So I think these svd_trunc_pullback(2)! methods really only make sense in combination with something like randomized svd, but still it would be interesting to see the relative performance of each of them.

Yes indeed. For these benchmarks I explicitly wanted to check the performance of the trunc pullbacks but it was still more convenient to just compute the forward using a full SVD.

Copy link
Copy Markdown
Member

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably also worth it to add the remove_f_gauge_dependence functions to the public list, given that we explicitly intend them to be used by TensorKit.

Overall looks like a great PR though, this should hopefully stabilize some of the issues we've been having 🥳

Some remaining question that definitely don't have to be addressed here but is worth bringing up:

  • Is it worth it to refactor the sylvester solver into its own function, or is this one too hand-crafted for this specific purpose? This might make it more convenient to swap out the Krylov-based solver, but I don't mean to add too much burden to this PR/this package for that either.

Comment thread src/pullbacks/svd.jl Outdated
Comment thread src/pullbacks/svd.jl Outdated
Comment thread src/pullbacks/svd.jl Outdated
Comment thread src/pullbacks/svd.jl Outdated
Comment thread src/pullbacks/svd.jl Outdated
Comment thread src/pullbacks/svd.jl Outdated
Comment thread src/pullbacks/svd.jl Outdated
Comment thread src/pullbacks/svd.jl Outdated
Comment thread src/pullbacks/svd.jl Outdated
Comment thread src/pullbacks/svd.jl Outdated
@Jutho
Copy link
Copy Markdown
Member Author

Jutho commented Apr 20, 2026

  • Is it worth it to refactor the sylvester solver into its own function, or is this one too hand-crafted for this specific purpose? This might make it more convenient to swap out the Krylov-based solver, but I don't mean to add too much burden to this PR/this package for that either.

I think it is pretty specific; where else do you think we might use this?

@lkdvos
Copy link
Copy Markdown
Member

lkdvos commented Apr 20, 2026

I really didnt look at this carefully, as in, is this just a regular Sylvester silver? In that case it shows up in the Clebsch-Gordan equation for finite groups

Comment thread src/pullbacks/svd.jl Outdated
@Jutho
Copy link
Copy Markdown
Member Author

Jutho commented Apr 20, 2026

I really didnt look at this carefully, as in, is this just a regular Sylvester silver? In that case it shows up in the Clebsch-Gordan equation for finite groups

No not at all; it uses that one of the two matrices is diagonal (or more generally, easily invertible), and furthermore assumes that the smallest singular value of that matrix is larger than the largest singular value of the other matrix (or more generally: opnorm(inv(B)) * opnorm(A) < 1)

@Jutho
Copy link
Copy Markdown
Member Author

Jutho commented Apr 20, 2026

I am happy to discuss the CG of finite group case though, maybe there are some other tricks that can be used.

@Jutho
Copy link
Copy Markdown
Member Author

Jutho commented Apr 20, 2026

Enzyme tests seem to be taken an extreme amount of time / GC / allocations. Not sure if this is a consequence of the changes here, I will have to check.

@kshyatt
Copy link
Copy Markdown
Member

kshyatt commented Apr 22, 2026

Enzyme tests seem to be taken an extreme amount of time / GC / allocations. Not sure if this is a consequence of the changes here, I will have to check.

I think it's a consequence of how the Enzyme tests work. Basically, they use autodiff_thunk to generate the functions for forward and reverse pass, then carefully check all the memory properties and correctness. This is great and very thorough but also expensive, so there's a large compilation burden associated with each test.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

❌ Patch coverage is 96.83099% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/pullbacks/svd.jl 95.00% 8 Missing ⚠️
...ixAlgebraKitEnzymeExt/MatrixAlgebraKitEnzymeExt.jl 0.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
...gebraKitMooncakeExt/MatrixAlgebraKitMooncakeExt.jl 63.38% <100.00%> (-0.93%) ⬇️
src/MatrixAlgebraKit.jl 100.00% <ø> (ø)
src/pullbacks/eig.jl 87.50% <100.00%> (+1.29%) ⬆️
src/pullbacks/eigh.jl 83.14% <100.00%> (+1.89%) ⬆️
src/pullbacks/lq.jl 97.75% <100.00%> (+0.45%) ⬆️
src/pullbacks/qr.jl 96.73% <100.00%> (+0.68%) ⬆️
...ixAlgebraKitEnzymeExt/MatrixAlgebraKitEnzymeExt.jl 1.37% <0.00%> (+0.08%) ⬆️
src/pullbacks/svd.jl 90.00% <95.00%> (+0.74%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kshyatt
Copy link
Copy Markdown
Member

kshyatt commented Apr 22, 2026

GPU CI now passing thanks to a Buildkite fix.

@Jutho
Copy link
Copy Markdown
Member Author

Jutho commented Apr 22, 2026

Great. Not sure why it is failing on Ubuntu latest; probably timeout? I will already swap svd_trunc_pullback! with svd_trunc_pullback2!, as the latter should be much faster (even though these are only tested in the chainrules tests if I am correct).

@kshyatt
Copy link
Copy Markdown
Member

kshyatt commented Apr 23, 2026

TBH I think the speed of the pullback itself won't change things much, it's entirely compilation time in autodiff_thunk in Enzyme.Compiler and in the GC

@kshyatt
Copy link
Copy Markdown
Member

kshyatt commented Apr 23, 2026

Just as a note: this should close #150, right?

Jutho and others added 6 commits April 23, 2026 08:09
Co-authored-by: Lukas Devos <ldevos98@gmail.com>
@lkdvos lkdvos linked an issue Apr 23, 2026 that may be closed by this pull request
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Your PR no longer requires formatting changes. Thank you for your contribution!

@lkdvos lkdvos linked an issue Apr 23, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Member

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jutho I rebased on top of the latest main, and added the gauge dependence removal to the public list. I don't think the performance of the new SVD pullback is required to already merge this, mostly to have the fix of the full SVD out as soon as possible?

@Jutho
Copy link
Copy Markdown
Member Author

Jutho commented Apr 23, 2026

Ok. It wasn't really my intention to keep svd_trunc_pullback2! around after some real-world benchmarks confirmed that it is indeed much slower. So the question is whether we merge as is and remove it later? As long as it is not marked public, that should be fine without causing breaking changes, right.

Comment thread src/pullbacks/svd.jl Outdated
Co-authored-by: Jutho <Jutho@users.noreply.github.com>
@Jutho
Copy link
Copy Markdown
Member Author

Jutho commented Apr 23, 2026

One final change could be to replace Sinv with S⁻¹ in svd_trunc_pullback!.

@lkdvos lkdvos merged commit 54f6021 into main Apr 23, 2026
10 checks passed
@lkdvos lkdvos deleted the jh/svdpullback branch April 23, 2026 19:05
@lkdvos lkdvos mentioned this pull request Apr 24, 2026
lkdvos referenced this pull request Apr 24, 2026
* Update changelog for v0.6.6

* Bump version to v0.6.6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SVD pullback of rank-deficient matrix svd_pullback doesn't handle output from svd_full

4 participants