RTX 3090 Ti (Ampere): suspend hangs on Nth S3 cycle — virtualBar2[GPU_GFID_PF].pCpuMapping == NULL on resume after per-cycle NVLink unilateral-shutdown accumulation — kernel 7.0.3, nvidia-open 595.71.05

---

### NVIDIA Open GPU Kernel Modules Version

`595.71.05` (Arch package `nvidia-open-dkms 595.71.05-2`)

### Please confirm this issue does not happen with the proprietary driver (of the same version).

- [ ] I confirm that this does not happen with the proprietary driver package.

Arch Linux retired the proprietary `nvidia-dkms` package in the 590.x series transition;
the proprietary kernel modules for 595.x are not available via the package manager and
would require NVIDIA's `.run` installer. This test has not been performed. I am willing
to do so if maintainers consider it necessary to isolate the regression.

### Operating System and Version

Arch Linux (rolling release)

### Kernel Release

`Linux 7.0.3-arch1-2 #1 SMP PREEMPT_DYNAMIC Fri, 01 May 2026 15:49:22 +0000 x86_64`

### Hardware: GPU

`NVIDIA GeForce RTX 3090 Ti` (Ampere GA102, PCI `0000:08:00.0`)

Single-GPU desktop. No NVLink peers. AMD Raphael iGPU present but blacklisted — confirmed
not loaded during any of the three failed sessions.

### Describe the bug

S3 (deep) suspend/resume fails, but **not on every resume**. The failure always falls on
the **last resume of the boot session** after N prior successful cycles. Across three
sessions: N = 11, 6, and 12. Sleep duration does not predict the hang — 24-hour and
15-hour sleeps succeeded in sessions that later failed.

The third failure (2026-05-17) produced visible host-driver assertion failures showing
`virtualBar2[GPU_GFID_PF].pCpuMapping` is NULL at resume. The first two failures
(2026-05-07, 2026-05-10) were completely silent — no assertions, no panic, no SSH
response after wake.

In all three sessions, every S3 suspend logged exactly one instance of:
```
NVRM: knvlinkCoreShutdownDeviceLinks_IMPL: Need to shutdown all links unilaterally for GPU0
```
one-to-one with each `PM: suspend entry (deep)`. The RTX 3090 Ti has no NVLink peers, so
the driver always takes the unilateral fallback on every cycle. We believe repeated
traversal of this path accumulates state corruption in the GMMU VA allocator that
eventually prevents `virtualBar2[GPU_GFID_PF].pCpuMapping` from being restored on resume.

### Failure signature (third incident — only time assertions were visible)

```
NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: NULL != pIter->pMap @ virt_mem_allocator_gm107.c:2024
NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: progress == entryIndexHi - entryIndexLo + 1 @ mmu_walk_map.c:170
NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: NV_OK == status @ mmu_walk.c:541
NVRM: GPU0 mmuWalkMap: Failed to map VA Range 0x86000000 to 0x865fffff. Status = 0x00000040
NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ mmu_walk_map.c:75
NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: pEntries != NULL @ gmmu_walk.c:826
NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ mmu_walk.c:391
NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: pEntries != NULL @ gmmu_walk.c:826
NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: progress == 1 @ mmu_walk.c:1522
NVRM: GPU0 mmuWalkUnmap: Failed to unmap VA Range 0x86000000 to 0x865fffff. Status = 0x00000040
NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ mmu_walk_unmap.c:62
NVRM: GPU0 mmuWalkMap: Unmap failed with status = 0x00000040
NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: NV_OK == unmapStatus @ mmu_walk_map.c:84
NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: NV_OK == status @ gpu_vaspace.c:2036
NVRM: GPU0 nvCheckFailedNoLog: Check failed: NV_OK == status @ virt_mem_allocator_gm107.c:2552
NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: (pKernelBus->pReadToFlush != NULL ||
  pKernelBus->virtualBar2[GPU_GFID_PF].pCpuMapping != NULL) @ kern_bus_gv100.c:388
```

The final assertion is the root observable: `virtualBar2[GPU_GFID_PF].pCpuMapping` is
NULL on resume. This is the host CPU's mapping into PCIe BAR2, used for all internal GPU
memory access. Without it, every GMMU walk fails in a cascade. `PM: suspend exit` was
never logged — the resume never completed.

### What has been ruled out

- **Duration-dependent hang:** 24-hour and 15-hour sleeps succeeded in sessions that later
  failed. Duration alone does not predict the hang.
- **amdgpu interaction:** amdgpu was loaded in two failed sessions and blacklisted in the
  third. The failure pattern was identical in all three.
- **Issue #1095 (objtool/jump_label BUG):** That issue fails on every resume. Our failure
  is the Nth cycle; all prior cycles in the same session succeed.
- **Issue #1117 (Blackwell s2idle GSP RPC timeout):** Different GPU arch (Blackwell vs
  Ampere), different sleep mode (s2idle vs S3 deep). Session C shows host-driver VA space
  corruption, not a GSP-level stall.

### Hypothesis

Each S3 suspend cycle traverses the NVLink unilateral-shutdown path
(`knvlinkCoreShutdownDeviceLinks_IMPL`), leaking or incompletely cleaning up VA allocator
state in the GMMU (`virt_mem_allocator_gm107.c`, `mmu_walk.c`). After N cycles, the VA
space used to establish `virtualBar2[GPU_GFID_PF].pCpuMapping` is exhausted or corrupted.
The next resume finds `pCpuMapping == NULL`, fails every GMMU walk, and freezes.

Variable N (6, 11, 12) is consistent with a per-cycle leak whose accumulation depends on
the driver's initial VA space state at boot.

### Relation to issue #1134

Issue #1134 reports the same kernel (`7.0.3-arch1-2`), driver (`595.71.05`), GPU family
(RTX 3090), and OS (Arch Linux, Hyprland). That reporter observed
`dmaAllocMapping_GM107: can't alloc VA space for mapping` and `NV_ERR_NO_MEMORY (0x51)`
from `virt_mem_allocator_gm107.c` → Xid 31 → Xid 154, triggered by a Chromium renderer
process. The VA allocator is shared; the trigger path (DRM GEM ops vs. S3 PM callbacks
via NVLink shutdown) and depleted resource (BAR1 userspace window vs. BAR2 internal
aperture) differ. These may be distinct bugs in the same allocator, or the same leak via
different trigger paths.

### Bug Incidence

Three occurrences across three boot sessions. No deterministic per-step reproducer —
accumulation takes multiple suspend cycles within a single kernel session.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RTX 3090 Ti (Ampere): suspend hangs on Nth S3 cycle — virtualBar2[GPU_GFID_PF].pCpuMapping == NULL on resume after per-cycle NVLink unilateral-shutdown accumulation — kernel 7.0.3, nvidia-open 595.71.05 #1148

NVIDIA Open GPU Kernel Modules Version

Please confirm this issue does not happen with the proprietary driver (of the same version).

Operating System and Version

Kernel Release

Hardware: GPU

Describe the bug

Failure signature (third incident — only time assertions were visible)

What has been ruled out

Hypothesis

Relation to issue #1134

Bug Incidence

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

RTX 3090 Ti (Ampere): suspend hangs on Nth S3 cycle — virtualBar2[GPU_GFID_PF].pCpuMapping == NULL on resume after per-cycle NVLink unilateral-shutdown accumulation — kernel 7.0.3, nvidia-open 595.71.05 #1148

Description

NVIDIA Open GPU Kernel Modules Version

Please confirm this issue does not happen with the proprietary driver (of the same version).

Operating System and Version

Kernel Release

Hardware: GPU

Describe the bug

Failure signature (third incident — only time assertions were visible)

What has been ruled out

Hypothesis

Relation to issue #1134

Bug Incidence

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions