B200 intermittent link failures #1149
Unanswered
jed-hacker
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
NVML:Unable to retrieve Nvlink information as all links are inActive
cuda version:590.48.01
A single SIM card failure occurs randomly on the same node, and the issue is resolved after each reboot. In most cases,
nvidia-bug-report.shfreezes during execution. Axid 149error was found in the only successfully exported log file. The problem is currently unidentified.Beta Was this translation helpful? Give feedback.
All reactions