Summary
RTX PRO 6000 Blackwell Workstation Edition crashes under sustained LLM inference using SGLang with FP8 quantized model. The GPU enters an unrecoverable state requiring full system reboot.
System Information
- GPU: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (96GB)
- Driver: nvidia-open 595.71.05 (Ubuntu package)
- OS: Ubuntu 24.04.4 LTS
- Kernel: 6.17.0-29-generic
- Workload: SGLang serving Qwen3 FP8 model (--max-running-requests 16)
Crash Signature
NVRM: krcWatchdog_IMPL: RC watchdog: GPU is probably locked! Notify Timeout Seconds: 7
NVRM: Xid (PCI:0000:01:00): 8, pid=48694, name=python3, channel 0x00000008
What Was Tried
- Reduced
--max-running-requests from 16 to 8 (pending test)
- Power limit is already at 480W (default 600W) - not hitting power wall
- Temperature was normal at crash time (no thermal throttling in logs)
Related Issues
This appears to be the same GSP firmware halt class as:
The firmware RE analysis in #1080 confirms the root cause is missing GPU reset recovery path for Blackwell in the kernel driver.
Expected Behavior
GPU should remain stable under sustained LLM inference workloads, or at minimum have a working recovery path that doesn't require full system reboot.
Summary
RTX PRO 6000 Blackwell Workstation Edition crashes under sustained LLM inference using SGLang with FP8 quantized model. The GPU enters an unrecoverable state requiring full system reboot.
System Information
Crash Signature
What Was Tried
--max-running-requestsfrom 16 to 8 (pending test)Related Issues
This appears to be the same GSP firmware halt class as:
The firmware RE analysis in #1080 confirms the root cause is missing GPU reset recovery path for Blackwell in the kernel driver.
Expected Behavior
GPU should remain stable under sustained LLM inference workloads, or at minimum have a working recovery path that doesn't require full system reboot.