kubernetes: Jobs with certain naming fail

What happened: When creating a kubernetes job, its pod status immediately turns to “terminating”. I noticed that it happens only when my job name starts with “train”. If I name the job “train”, “train1”, “train2” or similar when the problem occurs. If I name the job in any different way, for example “test1”, then it runs successfully.

What you expected to happen:

Job running sucessfully irrespective of naming.

How to reproduce it (as minimally and precisely as possible): Here is a yaml file stripped down to a bare minimum:

apiVersion: batch/v1
kind: Job 
metadata:
  name: train1
  namespace: sentisight
spec:
  template:
    spec:
      containers:
      - name: foobar
        image: busybox
        command: ["/bin/sleep"]
        args: ["60" ]
      restartPolicy: Never

Here is the last 20 lines of journalctl -u kubelet output:

Nov 15 09:22:51 gpu-box-2 kubelet[10336]: I1115 09:22:51.064581   10336 reconciler.go:181] operationExecutor.UnmountVolume started for volume "default-token-pjtcb" (UniqueName: "kubernetes.io/secret/990303e3-3c78-428a-8f14-fd28bbeefb35-default-token-pjtcb") pod "990303e3-3c78-428a-8f14-fd28bbeefb35" (UID: "990303e3-3c78-428a-8f14-fd28bbeefb35")
Nov 15 09:22:51 gpu-box-2 kubelet[10336]: I1115 09:22:51.082946   10336 operation_generator.go:831] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/990303e3-3c78-428a-8f14-fd28bbeefb35-default-token-pjtcb" (OuterVolumeSpecName: "default-token-pjtcb") pod "990303e3-3c78-428a-8f14-fd28bbeefb35" (UID: "990303e3-3c78-428a-8f14-fd28bbeefb35"). InnerVolumeSpecName "default-token-pjtcb". PluginName "kubernetes.io/secret", VolumeGidValue ""
Nov 15 09:22:51 gpu-box-2 kubelet[10336]: I1115 09:22:51.164834   10336 reconciler.go:301] Volume detached for volume "default-token-pjtcb" (UniqueName: "kubernetes.io/secret/990303e3-3c78-428a-8f14-fd28bbeefb35-default-token-pjtcb") on node "gpu-box-2" DevicePath ""
Nov 15 09:22:54 gpu-box-2 kubelet[10336]: E1115 09:22:54.668461   10336 kubelet_pods.go:147] Mount cannot be satisfied for container "blabla", because the volume is missing or the volume mounter is nil: {Name:default-token-pjtcb ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil> SubPathExpr:}
Nov 15 09:22:54 gpu-box-2 kubelet[10336]: E1115 09:22:54.668521   10336 kuberuntime_manager.go:783] container start failed: CreateContainerConfigError: cannot find volume "default-token-pjtcb" to mount into container "blabla"
Nov 15 09:22:54 gpu-box-2 kubelet[10336]: E1115 09:22:54.668567   10336 pod_workers.go:191] Error syncing pod 990303e3-3c78-428a-8f14-fd28bbeefb35 ("train-2tfrv_sentisight(990303e3-3c78-428a-8f14-fd28bbeefb35)"), skipping: failed to "StartContainer" for "blabla" with CreateContainerConfigError: "cannot find volume \"default-token-pjtcb\" to mount into container \"blabla\""
Nov 15 09:23:44 gpu-box-2 kubelet[10336]: I1115 09:23:44.472686   10336 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-hfddr" (UniqueName: "kubernetes.io/secret/8cde07f1-7de4-4980-9969-42f6e272e849-default-token-hfddr") pod "train-wmczz" (UID: "8cde07f1-7de4-4980-9969-42f6e272e849")
Nov 15 09:23:45 gpu-box-2 kubelet[10336]: W1115 09:23:45.884479   10336 pod_container_deletor.go:75] Container "866f12c1fb7665cd9e77cf059d4307c4fc09c3863dc41b108c80c23d60250d20" not found in pod's containers
Nov 15 09:24:49 gpu-box-2 kubelet[10336]: I1115 09:24:49.301388   10336 reconciler.go:181] operationExecutor.UnmountVolume started for volume "default-token-hfddr" (UniqueName: "kubernetes.io/secret/8cde07f1-7de4-4980-9969-42f6e272e849-default-token-hfddr") pod "8cde07f1-7de4-4980-9969-42f6e272e849" (UID: "8cde07f1-7de4-4980-9969-42f6e272e849")
Nov 15 09:24:49 gpu-box-2 kubelet[10336]: I1115 09:24:49.322941   10336 operation_generator.go:831] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/8cde07f1-7de4-4980-9969-42f6e272e849-default-token-hfddr" (OuterVolumeSpecName: "default-token-hfddr") pod "8cde07f1-7de4-4980-9969-42f6e272e849" (UID: "8cde07f1-7de4-4980-9969-42f6e272e849"). InnerVolumeSpecName "default-token-hfddr". PluginName "kubernetes.io/secret", VolumeGidValue ""
Nov 15 09:24:49 gpu-box-2 kubelet[10336]: I1115 09:24:49.401639   10336 reconciler.go:301] Volume detached for volume "default-token-hfddr" (UniqueName: "kubernetes.io/secret/8cde07f1-7de4-4980-9969-42f6e272e849-default-token-hfddr") on node "gpu-box-2" DevicePath ""
Nov 15 09:24:50 gpu-box-2 kubelet[10336]: W1115 09:24:50.234197   10336 pod_container_deletor.go:75] Container "866f12c1fb7665cd9e77cf059d4307c4fc09c3863dc41b108c80c23d60250d20" not found in pod's containers
Nov 15 09:25:21 gpu-box-2 kubelet[10336]: W1115 09:25:21.550632   10336 status_manager.go:545] Failed to update status for pod "train-wmczz_karolis(8cde07f1-7de4-4980-9969-42f6e272e849)": failed to patch status "{\"status\":{\"containerStatuses\":[{\"containerID\":\"docker://b3851b22742e67be9e63b49db7129f427016b09fca76297c9e45d19703e2535b\",\"image\":\"busybox:latest\",\"imageID\":\"docker-pullable://busybox@sha256:1303dbf110c57f3edf68d9f5a16c082ec06c4cf7604831669faf2c712260b5a0\",\"lastState\":{},\"name\":\"blabla\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"exitCode\":0,\"finishedAt\":null,\"startedAt\":null}}}]}}" for pod "karolis"/"train-wmczz": pods "train-wmczz" not found
Nov 15 09:26:01 gpu-box-2 kubelet[10336]: I1115 09:26:01.247756   10336 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-pjtcb" (UniqueName: "kubernetes.io/secret/f21d2a76-e8f3-46a3-a12d-d5b7ae646414-default-token-pjtcb") pod "train-wcwtg" (UID: "f21d2a76-e8f3-46a3-a12d-d5b7ae646414")
Nov 15 09:26:01 gpu-box-2 kubelet[10336]: I1115 09:26:01.748839   10336 reconciler.go:181] operationExecutor.UnmountVolume started for volume "default-token-pjtcb" (UniqueName: "kubernetes.io/secret/f21d2a76-e8f3-46a3-a12d-d5b7ae646414-default-token-pjtcb") pod "f21d2a76-e8f3-46a3-a12d-d5b7ae646414" (UID: "f21d2a76-e8f3-46a3-a12d-d5b7ae646414")
Nov 15 09:26:01 gpu-box-2 kubelet[10336]: I1115 09:26:01.766977   10336 operation_generator.go:831] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/f21d2a76-e8f3-46a3-a12d-d5b7ae646414-default-token-pjtcb" (OuterVolumeSpecName: "default-token-pjtcb") pod "f21d2a76-e8f3-46a3-a12d-d5b7ae646414" (UID: "f21d2a76-e8f3-46a3-a12d-d5b7ae646414"). InnerVolumeSpecName "default-token-pjtcb". PluginName "kubernetes.io/secret", VolumeGidValue ""
Nov 15 09:26:01 gpu-box-2 kubelet[10336]: I1115 09:26:01.849063   10336 reconciler.go:301] Volume detached for volume "default-token-pjtcb" (UniqueName: "kubernetes.io/secret/f21d2a76-e8f3-46a3-a12d-d5b7ae646414-default-token-pjtcb") on node "gpu-box-2" DevicePath ""
Nov 15 09:26:04 gpu-box-2 kubelet[10336]: E1115 09:26:04.806350   10336 kubelet_pods.go:147] Mount cannot be satisfied for container "foobar", because the volume is missing or the volume mounter is nil: {Name:default-token-pjtcb ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil> SubPathExpr:}
Nov 15 09:26:04 gpu-box-2 kubelet[10336]: E1115 09:26:04.806424   10336 kuberuntime_manager.go:783] container start failed: CreateContainerConfigError: cannot find volume "default-token-pjtcb" to mount into container "foobar"
Nov 15 09:26:04 gpu-box-2 kubelet[10336]: E1115 09:26:04.806474   10336 pod_workers.go:191] Error syncing pod f21d2a76-e8f3-46a3-a12d-d5b7ae646414 ("train-wcwtg_sentisight(f21d2a76-e8f3-46a3-a12d-d5b7ae646414)"), skipping: failed to "StartContainer" for "foobar" with CreateContainerConfigError: "cannot find volume \"default-token-pjtcb\" to mount into container \"foobar\""

Anything else we need to know?: Before I had similar problem with jobs whose name were starting with “job”. Now that problem somehow magically disappeared, but now I have problem with jobs starting with “train” /sig node /sig scheduling

Environment:

  • Kubernetes version (use kubectl version): Client Version: version.Info{Major:“1”, Minor:“16”, GitVersion:“v1.16.0”, GitCommit:“2bd9643cee5b3b3a5ecbd3af49d09018f0773c77”, GitTreeState:“clean”, BuildDate:“2019-09-18T14:36:53Z”, GoVersion:“go1.12.9”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“16”, GitVersion:“v1.16.0”, GitCommit:“2bd9643cee5b3b3a5ecbd3af49d09018f0773c77”, GitTreeState:“clean”, BuildDate:“2019-09-18T14:27:17Z”, GoVersion:“go1.12.9”, Compiler:“gc”, Platform:“linux/amd64”}

  • Cloud provider or hardware configuration:

  • OS (e.g: cat /etc/os-release): NAME=“Ubuntu” VERSION=“16.04.6 LTS (Xenial Xerus)” ID=ubuntu ID_LIKE=debian PRETTY_NAME=“Ubuntu 16.04.6 LTS” VERSION_ID=“16.04” HOME_URL=“http://www.ubuntu.com/” SUPPORT_URL=“http://help.ubuntu.com/” BUG_REPORT_URL=“http://bugs.launchpad.net/ubuntu/” VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial

  • Kernel (e.g. uname -a): Linux gpu-box-2 4.4.0-157-generic #185-Ubuntu SMP Tue Jul 23 09:17:01 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:

  • Network plugin and version (if this is a network-related bug):

  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 32 (11 by maintainers)

Commits related to this issue

Most upvoted comments

@n4j Thanks for having a look at this. I work with @brandond . I’m attaching a k3s log from a test environment that experienced this issue. I believe the three lines listed below may be most relevant. k3s.log

4766 Feb 23 04:48:19 ip-10-1-1-57 k3s[1461]: E0223 04:48:19.373686    1461 kubelet_pods.go:153] Mount cannot be satisfied for container "cordon", becaus     e the volume is missing (ok=false) or the volume mounter (vol.Mounter) is nil (vol={Mounter:<nil> BlockVolumeMapper:<nil> SELinuxLabeled:false Read     Only:false InnerVolumeSpecName:}): {Name:host-root ReadOnly:false MountPath:/host SubPath: MountPropagation:<nil> SubPathExpr:}
4767 Feb 23 04:48:19 ip-10-1-1-57 k3s[1461]: E0223 04:48:19.374412    1461 kuberuntime_manager.go:815] init container &Container{Name:cordon,Image:ranch     er/kubectl:v1.18.0,Command:[],Args:[cordon ip-10-1-1-57],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{EnvVar{Name:SYSTEM_UPGRADE_NODE_NAME,Valu     e:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:spec.nodeName,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKey     Ref:nil,},},EnvVar{Name:SYSTEM_UPGRADE_POD_NAME,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.name,     },ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:SYSTEM_UPGRADE_POD_UID,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectF     ieldSelector{APIVersion:v1,FieldPath:metadata.uid,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:SYSTEM_UPGRADE_PLAN_N     AME,Value:k3s-master-plan,ValueFrom:nil,},EnvVar{Name:SYSTEM_UPGRADE_PLAN_LATEST_HASH,Value:f06f6ee3a414d677a272a35f4216b31603a980687d5d5b51bcbc063     4,ValueFrom:nil,},EnvVar{Name:SYSTEM_UPGRADE_PLAN_LATEST_VERSION,Value:v1.19.8-k3s1,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:Resourc     eList{},Requests:ResourceList{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:host-root,ReadOnly:false,MountPath:/host,SubPath:,MountPropagation:ni     l,SubPathExpr:,},VolumeMount{Name:pod-info,ReadOnly:true,MountPath:/run/system-upgrade/pod,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount     {Name:system-upgrade-token-8gm44,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,}     ,},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:Always,SecurityContext:nil,Stdin:     false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,} start fai     led in pod apply-k3s-master-plan-on-ip-10-1-1-57-with-f06f6ee3a414d6-55d5p_cattle-system(c1fea4ed-8913-4841-bddc-c018629b64d1): CreateContainerConf     igError: cannot find volume "host-root" to mount into container "cordon"
4768 Feb 23 04:48:19 ip-10-1-1-57 k3s[1461]: E0223 04:48:19.374603    1461 pod_workers.go:191] Error syncing pod c1fea4ed-8913-4841-bddc-c018629b64d1 ("     apply-k3s-master-plan-on-ip-10-1-1-57-with-f06f6ee3a414d6-55d5p_cattle-system(c1fea4ed-8913-4841-bddc-c018629b64d1)"), skipping: failed to "StartCo     ntainer" for "cordon" with CreateContainerConfigError: "cannot find volume \"host-root\" to mount into container \"cordon\""

If needed, I can attempt to reproduce this on newer versions of Kubernetes/k3s and provide logs.