Skip to content

[Bug]: Host BSOD (wandering bugchecks, corruption-pattern) triggered only by specific single-threaded CPU-heavy app inside WinXP #724

Description

@To-Azamat

Version

7.2.8

Host OS Type

Windows

Host OS name + version

Windows 10 Pro 22H2 19045.7417

Host Architecture

x86

Guest OS Type

Windows

Guest Architecture

x86

Guest OS name + version

Windows XP SP3

Component

Other

What happened?

[Written with the help of ClaudeAI after extensive testing & debugging]

Summary
A Windows 10 host BSODs reliably, but only when one specific legacy application runs inside a Windows XP guest. The same XP guest running other software is stable for days. The crashes are host-level — a guest workload is taking down the host, which should not be possible. Bugcheck codes and faulting modules vary across crashes (a corruption-style signature rather than a single repeatable driver fault), and span several unrelated kernel subsystems.

Host
Windows 10 22H2 19045.7417
AMD Ryzen 9 7900 (12-core / 24-thread)
ASRock B650I Lightning (Mini-ITX), BIOS 5.41 (current AGESA; VSoC verified at 1.1 V)
VirtualBox 7.2.8

Guest
Windows XP (32-bit), 1 vCPU, 1 Gb
Paravirtualization: None
Execution engine: default (hardware virtualization)
Nested paging: on
USB: tested both enabled and disabled — crashes occur in both configurations
Host file access: tested via both host-mounted/shared-folders and otherwise — shared-folder access (which previously always worked) also precedes crashes

Trigger
The crash occurs only when a specific legacy financial application runs inside the XP guest. The app is single-threaded and CPU-intensive — it continuously recalculates large spreadsheet-style datasets, pinning its single vCPU. A host BSOD follows within ~15 minutes to ~1 hour of the app running. The XP guest with any other workload is stable indefinitely. The host is otherwise rock-solid for days, including under heavier loads than this app produces (multiple simultaneous VMs running concurrently, all-core host stress).
Notably, the fault is not confined to one host data path: it occurs regardless of USB state and across different host-file-access methods, including shared folders that previously always worked. Combined with the varying faulting modules, this points toward general host-side kernel state corruption surfacing in whichever subsystem is active when the guest workload runs, rather than a fault localized to a single driver or path.

Crash signature (5 host dumps collected)
Bugcheck codes vary across crashes:
0x50 PAGE_FAULT_IN_NONPAGED_AREA — faulting in win32kfull!Win32FreePoolImpl (during a timer-object pool free)
0xD1 DRIVER_IRQL_NOT_LESS_OR_EQUAL — WppRecorder, called from USBXHCI (USB completion DPC path)
0x1E KMODE_EXCEPTION_NOT_HANDLED — nt!ExFreeHeapPool
0x0A IRQL_NOT_LESS_OR_EQUAL — nt!KiComputeCpuSetAffinity (CPU-set/affinity computation)

Common thread: most are access violations on bad or near-null pointers, frequently caught during pool frees, with corrupted stack frames in several dumps (invalid return addresses, an unreadable trap frame in one). The varying codes and modules — spanning the GUI subsystem, the USB stack, pool management, and CPU scheduling — indicate general kernel memory/state corruption rather than a single repeatable driver bug. Two dumps landing in CPU scheduling/execution-completion paths (nt!KiComputeCpuSetAffinity, nt!IopCompleteRequest) may be relevant to where in the host the corruption originates, but given the spread across subsystems I'm not asserting a specific mechanism.

What was ruled out
RAM: MemTest86, 4 passes, clean.
CPU silicon: Prime95 Small FFTs, single worker, ~1 hour on the preferred boost core (observed ~5.19 GHz under load). No error, no crash on bare metal — the generic single-core-boost stress condition does not reproduce it outside a VM.
Guest image: a known-good XP image from a different machine crashes identically under 7.2.8 here, so it is not specific to one (possibly corrupted) image.
Thermals/power: occurs on a well-cooled build at stock settings; not thermal, and unaffected by PBO power limits.
USB: crashes with USB fully disabled and with it enabled — independent of USB.
BIOS/AGESA SoC voltage: on current BIOS with VSoC at 1.1 V (well under the 1.3 V cap); still crashes.

Possible regression (NOT confirmed on identical hardware — stated honestly)
The same legacy app, in a WinXP image, runs flawlessly under VirtualBox 7.0.14 on a different machine — stable indefinitely. This is suggestive of a regression between 7.0.x and 7.2.x, but I want to be explicit that this is not a same-hardware comparison: the 7.0.14 machine differs in CPU and other hardware, though host OS is the same. I have not been able to downgrade the affected machine to 7.0.14 to confirm, because it hosts production VMs whose Guest Additions are matched to the current version and I can't risk disrupting them. So treat the regression angle as a hypothesis, not an established fact.

Reproducibility / what I can provide
All five host minidumps are gone, but I can generate new once as the BSOD is easily reproduced.
The VirtualBox VBox.log from a related crashed session.
The application binary can be shared with developers.
A synthetic equivalent — continuous heavy single-threaded floating-point recalculation inside the guest — may be the practical route to an independent reproducer. Note that bare-metal Prime95 single-threaded did not reproduce it, so the trigger appears specific to such a workload executing inside the guest under VirtualBox's hardware-virtualization path.
Attempting --vm-execution-engine recompiler to test the non-hardware-virt path was inconclusive: the recompiler is effectively unusable on this setup (the XP guest took >10 minutes to boot the desktop), so I couldn't accumulate enough workload to judge stability.

How can we reproduce this?

Only if I supply you with a legacy app causing this.

Did you upload all of your necessary log files, screenshots, etc.?

  • Yes, I've uploaded all pertinent files to this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions