DCGM: infoROM is corrupted. However, diagnostics of dcgm is all pass.

description of the problem

  • nvidia-smi report that infoROM is corrupted
    image

  • However, diagnostics of dcgm is all pass image

environment information

  • Bare Metal Server : QuantaGrid D52G-4U
  • GPU SKU(s) : Tesla V100-SXM2-32GB
  • OS : CentOS 7.8
  • DRIVER : 450.80.02
  • GPU power settings (nvidia-smi -q -d POWER) : nv_power.log
  • CPU(s) : Intel® Xeon® Gold 6154 CPU
  • RAM : 768 GB
  • Topology (nvidia-smi topo -m) : nv_topo.log
  • The output of nvidia-smi -q : nv_q.log
  • Full output of dcgmi -v : dcgm_v.log

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 15 (1 by maintainers)

Most upvoted comments

@likueimo, Thank you for the logs. Unfortunately, your inforom is indeed corrupted, and nvidia-smi reports valid corruption status. The nvvs interpretation of the status is a bit strange, and we will investigate further and provide a fix for this issue.