go-nvml: GetProcessUtilization says: Insufficient Size

image

I run pytorch in a docker container, with the 4th gpu. I run go-nvml in host environment.

Code:

_, ret := dev.GetProcessUtilization(ts)
if ret != nvml.SUCCESS {
	log.Printf("[x] Unable to call GetProcessUtilization", nvml.ErrorString(ret))
}

OUTPUTS:

   Unable to call GetProcessUtilization%!(EXTRA string=Insufficient Size)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

I don’t have a good answer as to why the API returns 100 if nothing is running, but I just did a quick check on the underlying C API, and it returns that same thing:

#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <sys/types.h>

#include <nvml.h>

int main()
{
    nvmlReturn_t ret;
    nvmlDevice_t device;
    uint32_t processSamplesCount;

    ret = nvmlInit();
    printf("nvmlInit: %s\n", nvmlErrorString(ret));

    ret = nvmlDeviceGetHandleByIndex(0, &device);
    printf("nvmlDeviceGetHandleByIndex: %s\n", nvmlErrorString(ret));

    ret = nvmlDeviceGetProcessUtilization(device, NULL, &processSamplesCount, 0);
    printf("nvmlDeviceGetProcessUtilization: %d, %s\n", processSamplesCount, nvmlErrorString(ret));

    ret = nvmlShutdown();
    printf("nvmlShutdown: %s\n", nvmlErrorString(ret));
}
$ gcc nvml.c -o nvml -lnvidia-ml
$ ./nvml
nvmlInit: Success
nvmlDeviceGetHandleByIndex: Success
nvmlDeviceGetProcessUtilization: 100, Insufficient Size
nvmlShutdown: Success

Unfortunately, this isn’t obvious from the documentation: https://github.com/NVIDIA/go-nvml/blob/master/gen/nvml/nvml.h#L5844

Which is why we missed it the first time around.