DCGM: infoROM is corrupted. However, diagnostics of dcgm is all pass.
description of the problem
-
nvidia-smi
report that infoROM is corrupted
-
However, diagnostics of dcgm is all pass
environment information
- Bare Metal Server : QuantaGrid D52G-4U
- GPU SKU(s) : Tesla V100-SXM2-32GB
- OS : CentOS 7.8
- DRIVER : 450.80.02
- GPU power settings (
nvidia-smi -q -d POWER
) : nv_power.log - CPU(s) : Intel® Xeon® Gold 6154 CPU
- RAM : 768 GB
- Topology (
nvidia-smi topo -m
) : nv_topo.log - The output of
nvidia-smi -q
: nv_q.log - Full output of
dcgmi -v
: dcgm_v.log
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 15 (1 by maintainers)
@likueimo, Thank you for the logs. Unfortunately, your inforom is indeed corrupted, and nvidia-smi reports valid corruption status. The nvvs interpretation of the status is a bit strange, and we will investigate further and provide a fix for this issue.