kubernetes: Jobs with certain naming fail
What happened: When creating a kubernetes job, its pod status immediately turns to “terminating”. I noticed that it happens only when my job name starts with “train”. If I name the job “train”, “train1”, “train2” or similar when the problem occurs. If I name the job in any different way, for example “test1”, then it runs successfully.
What you expected to happen:
Job running sucessfully irrespective of naming.
How to reproduce it (as minimally and precisely as possible): Here is a yaml file stripped down to a bare minimum:
apiVersion: batch/v1
kind: Job
metadata:
name: train1
namespace: sentisight
spec:
template:
spec:
containers:
- name: foobar
image: busybox
command: ["/bin/sleep"]
args: ["60" ]
restartPolicy: Never
Here is the last 20 lines of journalctl -u kubelet output:
Nov 15 09:22:51 gpu-box-2 kubelet[10336]: I1115 09:22:51.064581 10336 reconciler.go:181] operationExecutor.UnmountVolume started for volume "default-token-pjtcb" (UniqueName: "kubernetes.io/secret/990303e3-3c78-428a-8f14-fd28bbeefb35-default-token-pjtcb") pod "990303e3-3c78-428a-8f14-fd28bbeefb35" (UID: "990303e3-3c78-428a-8f14-fd28bbeefb35")
Nov 15 09:22:51 gpu-box-2 kubelet[10336]: I1115 09:22:51.082946 10336 operation_generator.go:831] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/990303e3-3c78-428a-8f14-fd28bbeefb35-default-token-pjtcb" (OuterVolumeSpecName: "default-token-pjtcb") pod "990303e3-3c78-428a-8f14-fd28bbeefb35" (UID: "990303e3-3c78-428a-8f14-fd28bbeefb35"). InnerVolumeSpecName "default-token-pjtcb". PluginName "kubernetes.io/secret", VolumeGidValue ""
Nov 15 09:22:51 gpu-box-2 kubelet[10336]: I1115 09:22:51.164834 10336 reconciler.go:301] Volume detached for volume "default-token-pjtcb" (UniqueName: "kubernetes.io/secret/990303e3-3c78-428a-8f14-fd28bbeefb35-default-token-pjtcb") on node "gpu-box-2" DevicePath ""
Nov 15 09:22:54 gpu-box-2 kubelet[10336]: E1115 09:22:54.668461 10336 kubelet_pods.go:147] Mount cannot be satisfied for container "blabla", because the volume is missing or the volume mounter is nil: {Name:default-token-pjtcb ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil> SubPathExpr:}
Nov 15 09:22:54 gpu-box-2 kubelet[10336]: E1115 09:22:54.668521 10336 kuberuntime_manager.go:783] container start failed: CreateContainerConfigError: cannot find volume "default-token-pjtcb" to mount into container "blabla"
Nov 15 09:22:54 gpu-box-2 kubelet[10336]: E1115 09:22:54.668567 10336 pod_workers.go:191] Error syncing pod 990303e3-3c78-428a-8f14-fd28bbeefb35 ("train-2tfrv_sentisight(990303e3-3c78-428a-8f14-fd28bbeefb35)"), skipping: failed to "StartContainer" for "blabla" with CreateContainerConfigError: "cannot find volume \"default-token-pjtcb\" to mount into container \"blabla\""
Nov 15 09:23:44 gpu-box-2 kubelet[10336]: I1115 09:23:44.472686 10336 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-hfddr" (UniqueName: "kubernetes.io/secret/8cde07f1-7de4-4980-9969-42f6e272e849-default-token-hfddr") pod "train-wmczz" (UID: "8cde07f1-7de4-4980-9969-42f6e272e849")
Nov 15 09:23:45 gpu-box-2 kubelet[10336]: W1115 09:23:45.884479 10336 pod_container_deletor.go:75] Container "866f12c1fb7665cd9e77cf059d4307c4fc09c3863dc41b108c80c23d60250d20" not found in pod's containers
Nov 15 09:24:49 gpu-box-2 kubelet[10336]: I1115 09:24:49.301388 10336 reconciler.go:181] operationExecutor.UnmountVolume started for volume "default-token-hfddr" (UniqueName: "kubernetes.io/secret/8cde07f1-7de4-4980-9969-42f6e272e849-default-token-hfddr") pod "8cde07f1-7de4-4980-9969-42f6e272e849" (UID: "8cde07f1-7de4-4980-9969-42f6e272e849")
Nov 15 09:24:49 gpu-box-2 kubelet[10336]: I1115 09:24:49.322941 10336 operation_generator.go:831] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/8cde07f1-7de4-4980-9969-42f6e272e849-default-token-hfddr" (OuterVolumeSpecName: "default-token-hfddr") pod "8cde07f1-7de4-4980-9969-42f6e272e849" (UID: "8cde07f1-7de4-4980-9969-42f6e272e849"). InnerVolumeSpecName "default-token-hfddr". PluginName "kubernetes.io/secret", VolumeGidValue ""
Nov 15 09:24:49 gpu-box-2 kubelet[10336]: I1115 09:24:49.401639 10336 reconciler.go:301] Volume detached for volume "default-token-hfddr" (UniqueName: "kubernetes.io/secret/8cde07f1-7de4-4980-9969-42f6e272e849-default-token-hfddr") on node "gpu-box-2" DevicePath ""
Nov 15 09:24:50 gpu-box-2 kubelet[10336]: W1115 09:24:50.234197 10336 pod_container_deletor.go:75] Container "866f12c1fb7665cd9e77cf059d4307c4fc09c3863dc41b108c80c23d60250d20" not found in pod's containers
Nov 15 09:25:21 gpu-box-2 kubelet[10336]: W1115 09:25:21.550632 10336 status_manager.go:545] Failed to update status for pod "train-wmczz_karolis(8cde07f1-7de4-4980-9969-42f6e272e849)": failed to patch status "{\"status\":{\"containerStatuses\":[{\"containerID\":\"docker://b3851b22742e67be9e63b49db7129f427016b09fca76297c9e45d19703e2535b\",\"image\":\"busybox:latest\",\"imageID\":\"docker-pullable://busybox@sha256:1303dbf110c57f3edf68d9f5a16c082ec06c4cf7604831669faf2c712260b5a0\",\"lastState\":{},\"name\":\"blabla\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"exitCode\":0,\"finishedAt\":null,\"startedAt\":null}}}]}}" for pod "karolis"/"train-wmczz": pods "train-wmczz" not found
Nov 15 09:26:01 gpu-box-2 kubelet[10336]: I1115 09:26:01.247756 10336 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-pjtcb" (UniqueName: "kubernetes.io/secret/f21d2a76-e8f3-46a3-a12d-d5b7ae646414-default-token-pjtcb") pod "train-wcwtg" (UID: "f21d2a76-e8f3-46a3-a12d-d5b7ae646414")
Nov 15 09:26:01 gpu-box-2 kubelet[10336]: I1115 09:26:01.748839 10336 reconciler.go:181] operationExecutor.UnmountVolume started for volume "default-token-pjtcb" (UniqueName: "kubernetes.io/secret/f21d2a76-e8f3-46a3-a12d-d5b7ae646414-default-token-pjtcb") pod "f21d2a76-e8f3-46a3-a12d-d5b7ae646414" (UID: "f21d2a76-e8f3-46a3-a12d-d5b7ae646414")
Nov 15 09:26:01 gpu-box-2 kubelet[10336]: I1115 09:26:01.766977 10336 operation_generator.go:831] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/f21d2a76-e8f3-46a3-a12d-d5b7ae646414-default-token-pjtcb" (OuterVolumeSpecName: "default-token-pjtcb") pod "f21d2a76-e8f3-46a3-a12d-d5b7ae646414" (UID: "f21d2a76-e8f3-46a3-a12d-d5b7ae646414"). InnerVolumeSpecName "default-token-pjtcb". PluginName "kubernetes.io/secret", VolumeGidValue ""
Nov 15 09:26:01 gpu-box-2 kubelet[10336]: I1115 09:26:01.849063 10336 reconciler.go:301] Volume detached for volume "default-token-pjtcb" (UniqueName: "kubernetes.io/secret/f21d2a76-e8f3-46a3-a12d-d5b7ae646414-default-token-pjtcb") on node "gpu-box-2" DevicePath ""
Nov 15 09:26:04 gpu-box-2 kubelet[10336]: E1115 09:26:04.806350 10336 kubelet_pods.go:147] Mount cannot be satisfied for container "foobar", because the volume is missing or the volume mounter is nil: {Name:default-token-pjtcb ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil> SubPathExpr:}
Nov 15 09:26:04 gpu-box-2 kubelet[10336]: E1115 09:26:04.806424 10336 kuberuntime_manager.go:783] container start failed: CreateContainerConfigError: cannot find volume "default-token-pjtcb" to mount into container "foobar"
Nov 15 09:26:04 gpu-box-2 kubelet[10336]: E1115 09:26:04.806474 10336 pod_workers.go:191] Error syncing pod f21d2a76-e8f3-46a3-a12d-d5b7ae646414 ("train-wcwtg_sentisight(f21d2a76-e8f3-46a3-a12d-d5b7ae646414)"), skipping: failed to "StartContainer" for "foobar" with CreateContainerConfigError: "cannot find volume \"default-token-pjtcb\" to mount into container \"foobar\""
Anything else we need to know?: Before I had similar problem with jobs whose name were starting with “job”. Now that problem somehow magically disappeared, but now I have problem with jobs starting with “train” /sig node /sig scheduling
Environment:
-
Kubernetes version (use
kubectl version): Client Version: version.Info{Major:“1”, Minor:“16”, GitVersion:“v1.16.0”, GitCommit:“2bd9643cee5b3b3a5ecbd3af49d09018f0773c77”, GitTreeState:“clean”, BuildDate:“2019-09-18T14:36:53Z”, GoVersion:“go1.12.9”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“16”, GitVersion:“v1.16.0”, GitCommit:“2bd9643cee5b3b3a5ecbd3af49d09018f0773c77”, GitTreeState:“clean”, BuildDate:“2019-09-18T14:27:17Z”, GoVersion:“go1.12.9”, Compiler:“gc”, Platform:“linux/amd64”} -
Cloud provider or hardware configuration:
-
OS (e.g:
cat /etc/os-release): NAME=“Ubuntu” VERSION=“16.04.6 LTS (Xenial Xerus)” ID=ubuntu ID_LIKE=debian PRETTY_NAME=“Ubuntu 16.04.6 LTS” VERSION_ID=“16.04” HOME_URL=“http://www.ubuntu.com/” SUPPORT_URL=“http://help.ubuntu.com/” BUG_REPORT_URL=“http://bugs.launchpad.net/ubuntu/” VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial -
Kernel (e.g.
uname -a): Linux gpu-box-2 4.4.0-157-generic #185-Ubuntu SMP Tue Jul 23 09:17:01 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux -
Install tools:
-
Network plugin and version (if this is a network-related bug):
-
Others:
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 32 (11 by maintainers)
Commits related to this issue
- Add more logging for Mount error Add additional logging for "Mount cannot be satisfied for container" error to help debug #85330. — committed to saad-ali/kubernetes by saad-ali 4 years ago
- Add more logging for Mount error Add additional logging for "Mount cannot be satisfied for container" error to help debug #85330. — committed to saad-ali/kubernetes by saad-ali 4 years ago
- Add more logging for Mount error Add additional logging for "Mount cannot be satisfied for container" error to help debug #85330. — committed to saad-ali/kubernetes by saad-ali 4 years ago
- Add more logging for Mount error Add additional logging for "Mount cannot be satisfied for container" error to help debug #85330. — committed to saad-ali/kubernetes by saad-ali 4 years ago
- Add more logging for Mount error Add additional logging for "Mount cannot be satisfied for container" error to help debug #85330. — committed to saad-ali/kubernetes by saad-ali 4 years ago
- Add more logging for Mount error Add additional logging for "Mount cannot be satisfied for container" error to help debug #85330. — committed to hidetatz/kubernetes by saad-ali 4 years ago
@n4j Thanks for having a look at this. I work with @brandond . I’m attaching a k3s log from a test environment that experienced this issue. I believe the three lines listed below may be most relevant. k3s.log
If needed, I can attempt to reproduce this on newer versions of Kubernetes/k3s and provide logs.