k8s-device-plugin: k8s-device-plugin fails with k8s static CPU policy
1. Issue or feature description
Kubelet configured with a static CPU policy (e.g. --cpu-manager-policy=static --kube-reserved cpu=0.1) will cause nvidia-smi to fail after short delay.
Configure a test pod to request a nvidia.com/gpu resource, then run a simple nvidia-smi command as “sleep 30; nvidia-smi” and this will always fail with: “Failed to initialize NVML: Unknown Error”
Running the same without the sleep, command works and nvidia-smi returns the expected info
2. Steps to reproduce the issue
Kubernetes 1.14 $ kubelet --version Kubernetes v1.14.8 Device plugin: nvidia/k8s-device-plugin:1.11 (also with 1.0.0.0-beta4)
apply the daemonset for the nvidia plugin then apply a pod yaml for a pod requesting one device:
kind: Pod
metadata:
name: gputest
spec:
containers:
- command:
- /bin/bash
args:
- -c
- "sleep 30; nvidia-smi"
image: nvidia/cuda:8.0-runtime-ubuntu16.04
name: app
resources:
limits:
cpu: "1"
memory: 1Gi
nvidia.com/gpu: "1"
requests:
cpu: "1"
memory: 1Gi
nvidia.com/gpu: "1"
restartPolicy: Never
tolerations:
- effect: NoSchedule
operator: Exists
nodeSelector:
beta.kubernetes.io/arch: amd64
then follow the pod logs:
Failed to initialize NVML: Unknown Error
The pod persists in this state
3. Information to attach (optional if deemed irrelevant)
Common error checking:
- The output of
nvidia-smi -a
on your host
==============NVSMI LOG==============
Timestamp : Tue Nov 12 12:22:08 2019
Driver Version : 390.30
Attached GPUs : 1
GPU 00000000:03:00.0
Product Name : Tesla M2090
Product Brand : Tesla
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : N/A
Accounting Mode Buffer Size : N/A
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0320512020115
GPU UUID : GPU-f473d23b-0a01-034e-933b-58d52ca40425
Minor Number : 0
VBIOS Version : 70.10.46.00.01
MultiGPU Board : No
Board ID : 0x300
GPU Part Number : N/A
Inforom Version
Image Version : N/A
OEM Object : 1.1
ECC Object : 2.0
Power Management Object : 4.0
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x03
Device : 0x00
Domain : 0x0000
Device Id : 0x109110DE
Bus Id : 00000000:03:00.0
Sub System Id : 0x088710DE
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : N/A
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : N/A
Performance State : P12
Clocks Throttle Reasons : N/A
FB Memory Usage
Total : 6067 MiB
Used : 0 MiB
Free : 6067 MiB
BAR1 Memory Usage
Total : N/A
Used : N/A
Free : N/A
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : N/A
Decoder : N/A
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Disabled
Pending : Disabled
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : N/A
GPU Shutdown Temp : N/A
GPU Slowdown Temp : N/A
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 29.81 W
Power Limit : 225.00 W
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 50 MHz
SM : 101 MHz
Memory : 135 MHz
Video : 135 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 650 MHz
SM : 1301 MHz
Memory : 1848 MHz
Video : 540 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
- Your docker configuration file (e.g:
/etc/docker/daemon.json
)
{
"experimental": true,
"storage-driver": "overlay2",
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
- The k8s-device-plugin container logs
2019/11/11 19:10:56 Loading NVML
2019/11/11 19:10:56 Fetching devices.
2019/11/11 19:10:56 Starting FS watcher.
2019/11/11 19:10:56 Starting OS watcher.
2019/11/11 19:10:56 Starting to serve on /var/lib/kubelet/device-plugins/nvidia.sock
2019/11/11 19:10:56 Registered device plugin with Kubelet
- The kubelet logs on the node (e.g:
sudo journalctl -r -u kubelet
) repeated:
Nov 12 12:32:21 dal1k8s-worker-06 kubelet[8053]: E1112 12:32:21.880196 8053 cpu_manager.go:252] [cpumanager] reconcileState: failed to add container (pod: kube-proxy-bm82q, container: kube-proxy, container id: 92273ce7687ead38fb1c59b18934179183ea1b9e4f59107e92eec2f987bb91be, error: rpc error: code = Unknown desc
Nov 12 12:32:21 dal1k8s-worker-06 kubelet[8053]: I1112 12:32:21.880175 8053 policy_static.go:195] [cpumanager] static policy: RemoveContainer (container id: 92273ce7687ead38fb1c59b18934179183ea1b9e4f59107e92eec2f987bb91be)
Nov 12 12:32:21 dal1k8s-worker-06 kubelet[8053]: : unknown
Nov 12 12:32:21 dal1k8s-worker-06 kubelet[8053]: E1112 12:32:21.880153 8053 cpu_manager.go:183] [cpumanager] AddContainer error: rpc error: code = Unknown desc = failed to update container "92273ce7687ead38fb1c59b18934179183ea1b9e4f59107e92eec2f987bb91be": Error response from daemon: Cannot update container 92273
Nov 12 12:32:21 dal1k8s-worker-06 kubelet[8053]: : unknown
Nov 12 12:32:21 dal1k8s-worker-06 kubelet[8053]: E1112 12:32:21.880081 8053 remote_runtime.go:350] UpdateContainerResources "92273ce7687ead38fb1c59b18934179183ea1b9e4f59107e92eec2f987bb91be" from runtime service failed: rpc error: code = Unknown desc = failed to update container "92273ce7687ead38fb1c59b1893417918
Additional information that might help better understand your environment and reproduce the bug:
-
Docker version from
docker version
Version: 18.09.1 -
Docker command, image and tag used
-
Kernel version from
uname -a
Linux dal1k8s-worker-06 4.4.0-135-generic NVIDIA/nvidia-docker#161-Ubuntu SMP Mon Aug 27 10:45:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
- Any relevant kernel output lines from
dmesg
[ 2.840610] nvidia: module license 'NVIDIA' taints kernel.
[ 2.879301] nvidia-nvlink: Nvlink Core is being initialized, major device number 245
[ 2.911779] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 390.30 Wed Jan 31 21:32:48 PST 2018
[ 2.912960] [drm] [nvidia-drm] [GPU ID 0x00000300] Loading driver
[ 13.893608] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 242
- NVIDIA packages version from
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=========================================================================-=========================================-=========================================-=======================================================================================================================================================
ii libnvidia-container-tools 1.0.1-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.0.1-1 amd64 NVIDIA container runtime library
ii nvidia-390 390.30-0ubuntu1 amd64 NVIDIA binary driver - version 390.30
ii nvidia-container-runtime 2.0.0+docker18.09.1-1 amd64 NVIDIA container runtime
ii nvidia-container-runtime-hook 1.4.0-1 amd64 NVIDIA container runtime hook
un nvidia-current <none> <none> (no description available)
un nvidia-docker <none> <none> (no description available)
ii nvidia-docker2 2.0.3+docker18.09.1-1 all nvidia-docker CLI wrapper
un nvidia-driver-binary <none> <none> (no description available)
un nvidia-legacy-340xx-vdpau-driver <none> <none> (no description available)
un nvidia-libopencl1-390 <none> <none> (no description available)
un nvidia-libopencl1-dev <none> <none> (no description available)
un nvidia-opencl-icd <none> <none> (no description available)
ii nvidia-opencl-icd-390 390.30-0ubuntu1 amd64 NVIDIA OpenCL ICD
un nvidia-persistenced <none> <none> (no description available)
ii nvidia-prime 0.8.2 amd64 Tools to enable NVIDIA's Prime
ii nvidia-settings 410.79-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver
un nvidia-settings-binary <none> <none> (no description available)
un nvidia-smi <none> <none> (no description available)
un nvidia-vdpau-driver <none> <none> (no description available)
- NVIDIA container library version from
nvidia-container-cli -V
version: 1.0.1
build date: 2019-01-15T23:24+00:00
build revision: 038fb92d00c94f97d61492d4ed1f82e981129b74
build compiler: gcc-5 5.4.0 20160609
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections```
- [ ] NVIDIA container library logs (see [troubleshooting](https://github.com/NVIDIA/nvidia-docker/wiki/Troubleshooting))
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 15 (9 by maintainers)
I see. I think I can picture what the issue might be. Let me confirm it later today and I’ll provide an update here. Thanks.
PR to fix this is tested and ready to be merged. Will be included in the upcoming
v0.9.0
release.https://gitlab.com/nvidia/kubernetes/device-plugin/-/merge_requests/80
Yes, I can confirm that this is an issue.
MIG support in the
k8s-device-plugin
was tested together with thecompatWithCPUManager
when it first came out and it worked just fine. However, since that time, the way that the underlying GPU driver exposes MIG to a container has changed. It was originally based on something called/proc
basednvidia-capabilities
and now it’s based on something called/dev
basednvidia-capabilities
(more info on this here).Without going into too much detail, when the underlying driver switched its implementation for this, it broke
compatWithCPUManager
in thek8s-device-plugin
when MIG is enabled.The fix should be fairly straightforward and will involve listing out the set of device nodes associated with the
nvidia-capabibilities
that grant access to the MIG device being allocated – and sending them back to the kubelet (the same way the device nodes for full GPUs are sent back here).I have added this to our list of tasks for
v0.8.0
which will be released sometime in January.In the meantime, if you need this to work today, you can follow the advice in “Working with nvidia-capabilities” and flip your driver settings from
/dev
basednvidia-capabilities
to/proc
basednvidia-capabilities
via:That should get things working again until a fix come out. It is not a long-term fix, however, as support for
/proc
basednvidia-capabilities
will disappear in a future driver release.Thanks for reporting!
This is a known issue and reported before: https://github.com/NVIDIA/nvidia-container-toolkit/issues/138
Unfortunately, there is no upstream fix for this yet. The plan is to address it as part of the upcoming redeisgn for the device plugins: https://docs.google.com/document/d/1wPlJL8DsVpHnbVbTaad35ILB-jqoMLkGFLnQpWWNduc/edit