Minor updates to OffloadPC by dsroberts · Pull Request #5164 · firedrakeproject/firedrake

dsroberts · 2026-06-11T03:16:13Z

Description

Two minor updates to OffloadPC based on continued local testing. The first change is preventing certain types of of PETSc matrix from being offloaded. When implicit matrices are converted to aijcusparse matrices, PETSc constructs a dense matrix row-by-row then converts that to a sparse matrix. Since PETSc can transparently manage host-device transfers when using non-aijcusparse matrix types on cuda vectors, it is far more efficient for PETSc to offload as needed than to do a full implicit to dense to sparse conversion. I'm not sure this is a complete list of implicit matrix types, but these are the types I run into with our testing.

The second change is in the construction of the device matrices. If the out kwarg is not specified or set to None in Mat.convert(), the conversion happens in place meaning e.g. both P and P_device are now the same aijcusparse matrix. This was probably not picked up earlier as an aijcusparse matrix is an aij matrix with a managed device buffer under the hood, so silently converting an aij matrix probably does not affect any other part of a job, but it might cause issues with other matrix types.

There is a check to make sure the silent conversion hasn't happened, but I don't know how to test for the implicit -> dense -> sparse conversion. The right thing to do I think would be to check that A is A_dev if A.getType() is one of the 'do not offload' types, but I don't know how to retrieve that information out of nested PCs.

Co-authored-by: Josh Hope-Collins <jhc.jss@gmail.com>

Add comments describing the no-offload matrix types. Add a test to ensure value-only copying is happening correctly. Add start of test to ensure A is not being offloaded if it is a no-offload type matrix.

dsroberts · 2026-06-25T08:09:19Z

I've added another fix into this PR that copies P and A to P_dev and A_dev respectively by values only in OffloadPC.update(). We found that in parallel some configurations were diverging after a few timesteps. I think this was due to PETSc seeing the copied P_dev as a new matrix and performing its own assembly, and when it did that it lost the connectivity information into the halos. I have adapted the 04-burgers.ipynb demo into a test that runs until it reaches a steady state. When this test is run on 2 processes, the ("cg", "gamg") variant fails in 3 timesteps if a full copy of P is performed. With the value-only copy the solution reaches a steady state and the test passes.

connorjward · 2026-06-25T09:10:17Z

-        if A_dev.handle != P_dev.handle:
-            A.copy(A_dev)
+        # Perform a value-only copy
+        P.copy(P_dev, True)


What does the True do? The API doesn't make it clear: https://petsc.org/main/petsc4py/reference/petsc4py.PETSc.Mat.html#petsc4py.PETSc.Mat.copy

I think making it a kwarg would be more readable

connorjward · 2026-06-25T09:15:18Z

+    # Zero pressure at outlow at x = 1
+    bc2 = DirichletBC(W[1], 0.0, 2)
+
+    bcs = bc0 + [bc1, bc2]


Not critical, but this seems a bit weird to have bc0 be a list and bc1 and bc2 not be.

Copied straight from here

connorjward · 2026-06-25T09:15:39Z

+    # ksp0 = solver.snes.ksp.pc.getFieldSplitSchurGetSubKSP()[0]
+    # sub_pc = ksp0.pc.???
+    # offload_python_context = sub_pc.getPythonContext()
+    # assert offload_python_context.pc.P.type == "seqaijcusparse"
+    # assert offload_python_context.pc.A.type == "python"


I'm afraid I've not explained the issue well. This test returns the same values regardless of whether A is offloaded or not. What we're testing for here is whether the A operator has been offloaded at all. It makes no difference at this scale, but when you begin to scale up (~100k DOF) the matfree -> dense -> sparse conversion can take minutes, as opposed to microseconds for an aij -> aijcusparse conversion, so its critically important that we make sure this isn't happening.

This is where I need help. ksp0 = solver.snes.ksp.pc.getFieldSplitSchurGetSubKSP()[0].pc gives you the firedrake.AssembledPC, what I don't know is how to get the to the underlying firedrake.OffloadPC. Once we have that, we can get the python context and then extract the A and P operators from that. If you know what

sub_pc = ksp0.pc.???

should be in order to do that, that would be very helpful.

Ah I see now. I also don't know how to traverse the solver stack programmatically like that. @JHopeCollins do you know?

I am on leave currently but could dig into this next week if needed.

firedrake.AssembledPC itself just creates a PETSc.PC, and you have specified that PC to be python type with the OffloadPC as the python context.
So looking at the solver parameters we need to walk down the solver stack in this order (where you had the first few steps already):

solver -> snes -> ksp -> pcfieldsplit -> A00 ksp -> pc -> python assembled pc context -> pc -> python offload pc context -> pc -> operators

you can do something like:

solver.snes.ksp.pc.getFieldSplitSchurGetSubKSP()[0].pc.getPythonContext().pc.getPythonContext().pc.getOperators()

Or, more readably, using your ksp0:

assembled_ctx = ksp0.pc.getPythonContext() offload_ctx = assembled_ctx.pc.getPythonContext() A, P = offload_ctx.pc.getOperators()

Ah, I was missing getPythonContext()! I was going through every get... function in petsc4py.PC trying to figure out which one would get the 'subPC' or something along those lines. Thank you @JHopeCollins.

No problem!

For future reference, a python context is no longer a petsc4py thing, its a user-defined class with a bunch of callbacks (here Firedrake counts as a user), e.g. AssembledPC, OffloadPC etc, which are just (non-petsc4py) python classes.

So once you hit AssembledPC you're back to looking at Firedrake code not petsc4py, until of course you hit another petsc4py object.

Co-authored-by: Connor Ward <c.ward20@imperial.ac.uk>

…cture enum for value

JHopeCollins · 2026-06-25T11:52:53Z

+                        "ksp_monitor_true_residual": None,
+                        "ksp_converged_reason": None,
+                        "ksp_type": "cg",
+                        "ksp_rtol": 1e-5,
+                        "ksp_max_it": 1000,


Does this really take this many iterations? Seems like too many for the test suite

No, sorry, left over from G-ADOPT. Will reduce to 50

JHopeCollins · 2026-06-25T11:58:40Z

+            }
+        },
+        "fieldsplit_1": {
+            "pc_type": "none",


Is this the right thing to do here? Isn't this just solving a zero matrix because PETSc will default to building the Schur complement from a11, which for this form is just 0?

If it isn't doing that, then how many iterations does this take? Doesn't the ksp_type here default to gmres?

I took this from from the stokes_mini regression test, I only cared what was going on with fieldsplit_0 so left the fieldsplit_1 settings unchanged. The ksp_type does default to gmres and I see

Linear firedrake_3_fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 130

when run with ksp_converged_reason: None added to the fieldsplit_1 solver options. Setting it back to 'fieldsplit_schur_fact_type': 'diag', as per stokes_mini, the iteration count drops to 125 and it fails the assertion at the end. Come to think of it, the assertion at the end doesn't matter, all we need to know is that A was not offloaded, so I'll just end the test after those asserts.

If you don't care about the solution then you can add ksp_type: preonly to this split

…lerance in test_advection_offload to reduce test time.

dsroberts added 2 commits June 11, 2026 12:01

Minor updates to OffloadPC

f3b58a6

Add check for silent P matrix conversion

79447e9

dsroberts added the ci:cuda Run the test suite with a CUDA-enabled build label Jun 11, 2026

JHopeCollins reviewed Jun 11, 2026

View reviewed changes

Comment thread firedrake/preconditioners/offload.py Outdated

Update firedrake/preconditioners/offload.py

4ab1ef4

Co-authored-by: Josh Hope-Collins <jhc.jss@gmail.com>

connorjward requested changes Jun 23, 2026

View reviewed changes

Comment thread firedrake/preconditioners/offload.py Outdated

Comment thread firedrake/preconditioners/offload.py Outdated

Comment thread tests/firedrake/offload/test_poisson_offloading_pc.py Outdated

Comment thread firedrake/preconditioners/offload.py

dsroberts and others added 2 commits June 25, 2026 15:48

Merge branch 'firedrakeproject:main' into dsroberts/minor-offload-fixes

b525364

Respond to review comments. Add new tests.

5aad145

Add comments describing the no-offload matrix types. Add a test to ensure value-only copying is happening correctly. Add start of test to ensure A is not being offloaded if it is a no-offload type matrix.

connorjward requested changes Jun 25, 2026

View reviewed changes

dsroberts and others added 5 commits June 25, 2026 20:35

Update tests/firedrake/offload/test_advection_offload.py

98b415d

Co-authored-by: Connor Ward <c.ward20@imperial.ac.uk>

Update tests/firedrake/offload/test_advection_offload.py

acbd7c8

Co-authored-by: Connor Ward <c.ward20@imperial.ac.uk>

Update tests/firedrake/offload/test_advection_offload.py

1b9c16a

Co-authored-by: Connor Ward <c.ward20@imperial.ac.uk>

Update tests/firedrake/offload/test_no_offload_A.py

a586acf

Co-authored-by: Connor Ward <c.ward20@imperial.ac.uk>

Set structure argument in P.copy/A.copy with kwarg, use PETSc MatStru…

034f5ea

…cture enum for value

JHopeCollins reviewed Jun 25, 2026

View reviewed changes

dsroberts added 2 commits June 25, 2026 22:26

Changes to test_no_offload_A.py in response to review

49d0e3c

Make solve step in test_no_offload_A a no-op. Slacken steady-state to…

c5d9d96

…lerance in test_advection_offload to reduce test time.

Uh oh!

Conversation

dsroberts commented Jun 11, 2026

Description

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsroberts commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JHopeCollins Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dsroberts commented Jun 25, 2026 •

edited

Loading

JHopeCollins Jun 25, 2026 •

edited

Loading