argo-workflows: MountVolume.SetUp failed for volume "docker-sock" & "docker-lib"

BUG REPORT ?

What happened:

Try to deploy the basic hello-world.yaml example in a Kubernetes cluster on Azure AKS

Sound like it cannot mount the docker socket and the lib

  Warning  FailedMount            12s (x6 over 27s)  kubelet, aks-nodepool1-21279999-2  MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file
  Warning  FailedMount            12s (x6 over 27s)  kubelet, aks-nodepool1-21279999-2  MountVolume.SetUp failed for volume "docker-lib" : hostPath type check failed: /var/lib/docker is not a directory

How to reproduce it (as minimally and precisely as possible):

argo submit hello-world.yaml

Environment:

  • Argo version:
v2.1.0-beta2
  • Kubernetes version :
1.9.1 (RBAC disabled)

Other debugging information (if applicable):

  • workflow result:
$ argo get tf-workflow-5jcpn-3759387957
...
Running
...
  • executor logs:
Name:           tf-workflow-5jcpn-3759387957
Namespace:      tfworkflow
Node:           aks-nodepool1-21279999-2/10.240.0.4
Start Time:     Fri, 13 Apr 2018 13:51:51 -0400
Labels:         workflows.argoproj.io/completed=false
                workflows.argoproj.io/workflow=tf-workflow-5jcpn
Annotations:    workflows.argoproj.io/node-name=tf-workflow-5jcpn[0].get-workflow-info
                workflows.argoproj.io/template={"name":"get-workflow-info","inputs":{},"outputs":{"parameters":[{"name":"s3-model-url","valueFrom":{"path":"/tmp/s3-model-url"}},{"name":"s3-exported-url","valueFrom":{...
Status:         Pending
IP:
Controlled By:  Workflow/tf-workflow-5jcpn
Containers:
  main:
    Container ID:
    Image:         nervana/circleci:master
    Image ID:
    Port:          <none>
    Command:
      echo 's3://tfjob/models/myjob-07b1d/' | tr -d '[:space:]' > /tmp/s3-model-url; echo 's3://tfjob/models/myjob-07b1d/export/mnist/' | tr -d '[:space:]' > /tmp/s3-exported-url
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cpbjn (ro)
  wait:
    Container ID:
    Image:         argoproj/argoexec:v2.1.0-beta2
    Image ID:
    Port:          <none>
    Command:
      argoexec
    Args:
      wait
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      ARGO_POD_IP:      (v1:status.podIP)
      ARGO_POD_NAME:   tf-workflow-5jcpn-3759387957 (v1:metadata.name)
      ARGO_NAMESPACE:  tfworkflow (v1:metadata.namespace)
    Mounts:
      /argo/podmetadata from podmetadata (rw)
      /var/lib/docker from docker-lib (ro)
      /var/run/docker.sock from docker-sock (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cpbjn (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  podmetadata:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations -> annotations
  docker-lib:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/docker
    HostPathType:  Directory
  docker-sock:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/docker.sock
    HostPathType:  Socket
  default-token-cpbjn:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-cpbjn
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age                From                               Message
  ----     ------                 ----               ----                               -------
  Normal   Scheduled              28s                default-scheduler                  Successfully assigned tf-workflow-5jcpn-3759387957 to aks-nodepool1-21279999-2
  Normal   SuccessfulMountVolume  27s                kubelet, aks-nodepool1-21279999-2  MountVolume.SetUp succeeded for volume "podmetadata"
  Normal   SuccessfulMountVolume  27s                kubelet, aks-nodepool1-21279999-2  MountVolume.SetUp succeeded for volume "default-token-cpbjn"
  Warning  FailedMount            12s (x6 over 27s)  kubelet, aks-nodepool1-21279999-2  MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file
  Warning  FailedMount            12s (x6 over 27s)  kubelet, aks-nodepool1-21279999-2  MountVolume.SetUp failed for volume "docker-lib" : hostPath type check failed: /var/lib/docker is not a directory

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 7
  • Comments: 26 (9 by maintainers)

Commits related to this issue

Most upvoted comments

we have exact the same problem in AKS on 1.19.6

shall we just update the doc to show this? The same question asked again again, I think it is worth the updating the doc.

We had the same problem on the kind cluster, is argo support kind cluster?

image

So this seems to be the underlying cause: https://github.com/kubernetes/kubernetes/issues/61801 The fix will be in 1.9.7

Can you please raise a new issue?

I have this issue too. I have opened a new issue. Please help. Thanks.

Please use PNS executor

Hi - I’m still seeing this in a cluster not running docker (cri-o://1.18.1) is there a workaround?

Fixed for minikube/cri-o

Thanks @alexec

Assuming you deployed argo workflow to the argo namespace:

# deploy to k8s
kubectl create namespace argo && \
kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/download/v3.1.14/install.yaml

# change executor to pns
kubectl patch configmap workflow-controller-configmap --patch '{"data":{"containerRuntimeExecutor":"pns"}}'
kubectl describe configmap workflow-controller-configmap -n argo

# restart argo
kubectl scale deploy argo-server --replicas 0 -n argo && \
kubectl scale deploy workflow-controller --replicas 0 -n argo && \
watch kubectl get po -n argo

kubectl scale deploy argo-server --replicas 1 -n argo && \
kubectl scale deploy workflow-controller --replicas 1 -n argo && \
watch kubectl get po -n argo

# check change to pns
kubectl logs $(kubectl get po -n argo | grep 'argo-server' | awk '{print $1}') -n argo
kubectl logs $(kubectl get po -n argo | grep 'workflow-controller' | awk '{print $1}') -n argo | grep pns

# expose
kubectl -n argo port-forward deployment/argo-server 2746:2746

# test 
argo submit -n argo --watch https://raw.githubusercontent.com/argoproj/argo-workflows/master/examples/hello-world.yaml

# Ctrl+c
kubectl logs $(kubectl get po -n argo | grep 'hello' | awk '{print $1}') -n argo -c main
 _____________ 
< hello world >
 ------------- 
    \
     \
      \     
                    ##        .            
              ## ## ##       ==            
           ## ## ## ##      ===            
       /""""""""""""""""___/ ===        
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~   
       \______ o          __/            
        \    \        __/             
          \____\______/

# see pod history
kubectl describe po $(kubectl get po -n argo | grep 'hello' | awk '{print $1}') -n argo

Name:         hello-world-w2vsj
Namespace:    argo
Priority:     0
Node:         minikube/192.168.49.2
Start Time:   Thu, 21 Oct 2021 09:50:30 -0500
Labels:       workflows.argoproj.io/completed=true
              workflows.argoproj.io/workflow=hello-world-w2vsj
Annotations:  workflows.argoproj.io/node-name: hello-world-w2vsj
              workflows.argoproj.io/template:
                {"name":"whalesay","inputs":{},"outputs":{},"metadata":{},"container":{"name":"","image":"docker/whalesay:latest","command":["cowsay"],"ar...
Status:       Failed
IP:           10.85.0.109
IPs:
  IP:           10.85.0.109
  IP:           1100:200::6d
Controlled By:  Workflow/hello-world-w2vsj
Containers:
  wait:
    Container ID:  cri-o://9c83d4af0b38e58e0092201d1346ddef3ce1680254552b61afb1e181ca731b4b
    Image:         argoproj/argoexec:v3.1.14
    Image ID:      docker.io/argoproj/argoexec@sha256:4ecdb9193b7b26c3e4045263243276d73bdf12646ed929df6f74e3f343696775
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      wait
      --loglevel
      info
    State:          Terminated
      Reason:       Error
      Message:      failed to wait for main container to complete: timed out waiting for the condition: failed to establish pod watch: unknown (get pods)
      Exit Code:    1
      Started:      Thu, 21 Oct 2021 09:50:33 -0500
      Finished:     Thu, 21 Oct 2021 09:50:48 -0500
    Ready:          False
    Restart Count:  0
    Environment:
      ARGO_POD_NAME:                    hello-world-w2vsj (v1:metadata.name)
      ARGO_CONTAINER_RUNTIME_EXECUTOR:  pns
      GODEBUG:                          x509ignoreCN=0
      ARGO_CONTAINER_NAME:              wait
      ARGO_INCLUDE_SCRIPT_OUTPUT:       false
    Mounts:
      /argo/podmetadata from podmetadata (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6vft2 (ro)
  main:
    Container ID:  cri-o://53b2077f502ad9d4c83ef5c94dd977ef0f147297ec3384720dec09171e106769
    Image:         docker/whalesay:latest
    Image ID:      docker.io/docker/whalesay@sha256:178598e51a26abbc958b8a2e48825c90bc22e641de3d31e18aaf55f3258ba93b
    Port:          <none>
    Host Port:     <none>
    Command:
      cowsay
    Args:
      hello world
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 21 Oct 2021 09:50:36 -0500
      Finished:     Thu, 21 Oct 2021 09:50:36 -0500
    Ready:          False
    Restart Count:  0
    Environment:
      ARGO_CONTAINER_NAME:         main
      ARGO_INCLUDE_SCRIPT_OUTPUT:  false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6vft2 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  podmetadata:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations -> annotations
  kube-api-access-6vft2:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  2m6s  default-scheduler  Successfully assigned argo/hello-world-w2vsj to minikube
  Normal  Pulled     2m4s  kubelet            Container image "argoproj/argoexec:v3.1.14" already present on machine
  Normal  Created    2m4s  kubelet            Created container wait
  Normal  Started    2m4s  kubelet            Started container wait
  Normal  Pulling    2m4s  kubelet            Pulling image "docker/whalesay:latest"
  Normal  Pulled     2m2s  kubelet            Successfully pulled image "docker/whalesay:latest" in 2.532118121s
  Normal  Created    2m1s  kubelet            Created container main
  Normal  Started    2m1s  kubelet            Started container main

aks with 1.19 version starts using containerd instead of docker engine. Thus we have this issue.

Just started looking into using Argo as a replacement for CronJob because it supports timezones / + dst, but on 1.19.9 in AKS I’m running into the same problem as described above.

MountVolume.SetUp failed for volume “docker-sock” : hostPath type check failed: /var/run/docker.sock is not a socket file

n/m, followed what is specified in https://github.com/argoproj/argo-workflows/issues/5243 and was able to get it to work.

we are getting this again in AKS 1.19.3 is anyone else?

Very strange, based on that stat output, I can’t understand how kubernetes could be complaining:

hostPath type check failed: /var/run/docker.sock is not a socket file
hostPath type check failed: /var/lib/docker is not a directory

The stat command clearly states those files belonging to the expected file types.