container-engine-accelerators: Downloading driver fails on a K8S 1.18 GKE Cluster
Using the daemonset-nvidia-v450.yaml fails due to a 403 error in a cluster with version 1.18.14-gke.1200
. daemonset-preloaded.yaml works fine in an 1.17 cluster but also fails when using an 1.18 cluster.
I’ve only captured the log of the v450 installer:
+ COS_KERNEL_INFO_FILENAME=kernel_info
+ COS_KERNEL_SRC_HEADER=kernel-headers.tgz
+ TOOLCHAIN_URL_FILENAME=toolchain_url
+ TOOLCHAIN_ENV_FILENAME=toolchain_env
+ TOOLCHAIN_PKG_DIR=/build/cos-tools
+ CHROMIUMOS_SDK_GCS=https://storage.googleapis.com/chromiumos-sdk
+ ROOT_OS_RELEASE=/root/etc/os-release
+ KERNEL_SRC_HEADER=/build/usr/src/linux
+ NVIDIA_DRIVER_VERSION=450.51.06
+ NVIDIA_DRIVER_MD5SUM=
+ NVIDIA_INSTALL_DIR_HOST=/home/kubernetes/bin/nvidia
+ NVIDIA_INSTALL_DIR_CONTAINER=/usr/local/nvidia
+ ROOT_MOUNT_DIR=/root
+ CACHE_FILE=/usr/local/nvidia/.cache
+ LOCK_FILE=/root/tmp/cos_gpu_installer_lock
+ LOCK_FILE_FD=20
+ set +x
[INFO 2021-01-22 22:11:36 UTC] PRELOAD: false
[INFO 2021-01-22 22:11:36 UTC] Running on COS build id 13310.1041.38
[INFO 2021-01-22 22:11:36 UTC] Data dependencies (e.g. kernel source) will be fetched from https://storage.googleapis.com/cos-tools/13310.1041.38
[INFO 2021-01-22 22:11:36 UTC] Checking if this is the only cos-gpu-installer that is running.
[INFO 2021-01-22 22:11:36 UTC] Checking if third party kernel modules can be installed
/tmp/esp /
/
[INFO 2021-01-22 22:11:36 UTC] Checking cached version
[INFO 2021-01-22 22:11:36 UTC] Cache file /usr/local/nvidia/.cache not found.
[INFO 2021-01-22 22:11:36 UTC] Did not find cached version, building the drivers...
[INFO 2021-01-22 22:11:36 UTC] Downloading GPU installer ...
/usr/local/nvidia /
[INFO 2021-01-22 22:11:37 UTC] Downloading from https://storage.googleapis.com/nvidia-drivers-eu-public/nvidia-cos-project/85/tesla/450_00/450.51.06/NVIDIA-Linux-x86_64-450.51.06_85-13310-1041-38.cos
[INFO 2021-01-22 22:11:37 UTC] Downloading GPU installer from https://storage.googleapis.com/nvidia-drivers-eu-public/nvidia-cos-project/85/tesla/450_00/450.51.06/NVIDIA-Linux-x86_64-450.51.06_85-13310-1041-38.cos
curl: (22) The requested URL returned error: 403
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 3
- Comments: 23 (9 by maintainers)
I have found that setting differs also between UI and terraform default auth_scopes in UI include read access to storage, while default values in terraform (and probably in cli) - does not. Man has to specify explicitly the following
Hi, we have the same problem, also on newly created channels. There is a 403 Forbidden error when it tries to download the Nvidia drivers (which are publicly available).
failed to download GPU driver installer: failed to download GPU driver installer version 450.51.06: failed to download GPU driver installer, status: 403 Forbidden
We solved it by recreating the nodepool with the storage-ro scope. It looks like the default doesn’t work with the new version. Is this a bug on GCP side?