Minor updates to OffloadPC#5164
Conversation
Co-authored-by: Josh Hope-Collins <jhc.jss@gmail.com>
Add comments describing the no-offload matrix types. Add a test to ensure value-only copying is happening correctly. Add start of test to ensure A is not being offloaded if it is a no-offload type matrix.
|
I've added another fix into this PR that copies |
| if A_dev.handle != P_dev.handle: | ||
| A.copy(A_dev) | ||
| # Perform a value-only copy | ||
| P.copy(P_dev, True) |
There was a problem hiding this comment.
What does the True do? The API doesn't make it clear: https://petsc.org/main/petsc4py/reference/petsc4py.PETSc.Mat.html#petsc4py.PETSc.Mat.copy
There was a problem hiding this comment.
I think making it a kwarg would be more readable
| # Zero pressure at outlow at x = 1 | ||
| bc2 = DirichletBC(W[1], 0.0, 2) | ||
|
|
||
| bcs = bc0 + [bc1, bc2] |
There was a problem hiding this comment.
Not critical, but this seems a bit weird to have bc0 be a list and bc1 and bc2 not be.
| # ksp0 = solver.snes.ksp.pc.getFieldSplitSchurGetSubKSP()[0] | ||
| # sub_pc = ksp0.pc.??? | ||
| # offload_python_context = sub_pc.getPythonContext() | ||
| # assert offload_python_context.pc.P.type == "seqaijcusparse" | ||
| # assert offload_python_context.pc.A.type == "python" |
There was a problem hiding this comment.
I'm afraid I've not explained the issue well. This test returns the same values regardless of whether A is offloaded or not. What we're testing for here is whether the A operator has been offloaded at all. It makes no difference at this scale, but when you begin to scale up (~100k DOF) the matfree -> dense -> sparse conversion can take minutes, as opposed to microseconds for an aij -> aijcusparse conversion, so its critically important that we make sure this isn't happening.
This is where I need help. ksp0 = solver.snes.ksp.pc.getFieldSplitSchurGetSubKSP()[0].pc gives you the firedrake.AssembledPC, what I don't know is how to get the to the underlying firedrake.OffloadPC. Once we have that, we can get the python context and then extract the A and P operators from that. If you know what
sub_pc = ksp0.pc.???
should be in order to do that, that would be very helpful.
There was a problem hiding this comment.
Ah I see now. I also don't know how to traverse the solver stack programmatically like that. @JHopeCollins do you know?
I am on leave currently but could dig into this next week if needed.
There was a problem hiding this comment.
firedrake.AssembledPC itself just creates a PETSc.PC, and you have specified that PC to be python type with the OffloadPC as the python context.
So looking at the solver parameters we need to walk down the solver stack in this order (where you had the first few steps already):
solver -> snes -> ksp -> pcfieldsplit -> A00 ksp -> pc -> python assembled pc context -> pc -> python offload pc context -> pc -> operators
you can do something like:
solver.snes.ksp.pc.getFieldSplitSchurGetSubKSP()[0].pc.getPythonContext().pc.getPythonContext().pc.getOperators()Or, more readably, using your ksp0:
assembled_ctx = ksp0.pc.getPythonContext()
offload_ctx = assembled_ctx.pc.getPythonContext()
A, P = offload_ctx.pc.getOperators()There was a problem hiding this comment.
Ah, I was missing getPythonContext()! I was going through every get... function in petsc4py.PC trying to figure out which one would get the 'subPC' or something along those lines. Thank you @JHopeCollins.
There was a problem hiding this comment.
No problem!
For future reference, a python context is no longer a petsc4py thing, its a user-defined class with a bunch of callbacks (here Firedrake counts as a user), e.g. AssembledPC, OffloadPC etc, which are just (non-petsc4py) python classes.
So once you hit AssembledPC you're back to looking at Firedrake code not petsc4py, until of course you hit another petsc4py object.
Co-authored-by: Connor Ward <c.ward20@imperial.ac.uk>
Co-authored-by: Connor Ward <c.ward20@imperial.ac.uk>
Co-authored-by: Connor Ward <c.ward20@imperial.ac.uk>
Co-authored-by: Connor Ward <c.ward20@imperial.ac.uk>
…cture enum for value
| "ksp_monitor_true_residual": None, | ||
| "ksp_converged_reason": None, | ||
| "ksp_type": "cg", | ||
| "ksp_rtol": 1e-5, | ||
| "ksp_max_it": 1000, |
There was a problem hiding this comment.
Does this really take this many iterations? Seems like too many for the test suite
There was a problem hiding this comment.
No, sorry, left over from G-ADOPT. Will reduce to 50
| } | ||
| }, | ||
| "fieldsplit_1": { | ||
| "pc_type": "none", |
There was a problem hiding this comment.
Is this the right thing to do here? Isn't this just solving a zero matrix because PETSc will default to building the Schur complement from a11, which for this form is just 0?
If it isn't doing that, then how many iterations does this take? Doesn't the ksp_type here default to gmres?
There was a problem hiding this comment.
I took this from from the stokes_mini regression test, I only cared what was going on with fieldsplit_0 so left the fieldsplit_1 settings unchanged. The ksp_type does default to gmres and I see
Linear firedrake_3_fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 130
when run with ksp_converged_reason: None added to the fieldsplit_1 solver options. Setting it back to 'fieldsplit_schur_fact_type': 'diag', as per stokes_mini, the iteration count drops to 125 and it fails the assertion at the end. Come to think of it, the assertion at the end doesn't matter, all we need to know is that A was not offloaded, so I'll just end the test after those asserts.
There was a problem hiding this comment.
If you don't care about the solution then you can add ksp_type: preonly to this split
…lerance in test_advection_offload to reduce test time.
Description
Two minor updates to
OffloadPCbased on continued local testing. The first change is preventing certain types of of PETSc matrix from being offloaded. When implicit matrices are converted toaijcusparsematrices, PETSc constructs a dense matrix row-by-row then converts that to a sparse matrix. Since PETSc can transparently manage host-device transfers when using non-aijcusparsematrix types oncudavectors, it is far more efficient for PETSc to offload as needed than to do a full implicit to dense to sparse conversion. I'm not sure this is a complete list of implicit matrix types, but these are the types I run into with our testing.The second change is in the construction of the device matrices. If the
outkwarg is not specified or set toNoneinMat.convert(), the conversion happens in place meaning e.g. bothPandP_deviceare now the sameaijcusparsematrix. This was probably not picked up earlier as anaijcusparsematrix is anaijmatrix with a managed device buffer under the hood, so silently converting anaijmatrix probably does not affect any other part of a job, but it might cause issues with other matrix types.There is a check to make sure the silent conversion hasn't happened, but I don't know how to test for the implicit -> dense -> sparse conversion. The right thing to do I think would be to check that
A is A_devifA.getType()is one of the 'do not offload' types, but I don't know how to retrieve that information out of nested PCs.