longhorn: [BUG] After automatically force delete terminating pods of deployment on down node, data lost and I/O error

Describe the bug

Following the Improve Node Failure Handling By Automatically Force Delete Terminating Pods of StatefulSet/Deployment On Downed Node test steps.

Testing deployment with NodeDownPodDeletionPolicy = delete-both-statefulset-and-deployment-pod, when verifying data in the last step, observe data lost and I/O error.

To Reproduce

(1) Setup a cluster of 3 worker nodes (2) Install Longhorn v1.3.1-rc2 and set Default Replica Count = 2 (because we will turn off one node) (3) Create volume test-1 through UI with Replica Count = 2 (4) Create PV/PVC for volume test-1 through UI (5) Create deployment to use this volume test-1:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-deployment
  labels:
    name: test-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      name: test-deployment
  template:
    metadata:
      labels:
        name: test-deployment
    spec:
      containers:
        - name: test-deployment
          image: nginx:stable-alpine
          volumeMounts:
            - name: test-pod
              mountPath: /data
      volumes:
        - name: test-pod
          persistentVolumeClaim:
            claimName: test-1

(6) Write some data for verification later:

kubectl exec -it test-deployment-56f74b48c7-c7xsg -- /bin/sh
/ # echo test1 > /data/file1
/ # ls /data
file1       lost+found
/ # cat /data/file1
test1

(7) Set NodeDownPodDeletionPolicy = delete-both-statefulset-and-deployment-pod (8) Find the node which contains the pod of the Deployment. Power off the node (9) Wait till the pod.deletionTimestamp has passed (10) Verify that the pod is deleted and there is a new running replacement pod

kubectl get pods -o wide -w
NAME                               READY   STATUS    RESTARTS   AGE     IP           NODE            NOMINATED NODE   READINESS GATES
longhorn-test-nfs                  1/1     Running   0          6m19s   10.42.3.4    ip-10-0-1-251   <none>           <none>
longhorn-test-minio                1/1     Running   0          6m20s   10.42.3.3    ip-10-0-1-251   <none>           <none>
test-deployment-56f74b48c7-c7xsg   1/1     Running   0          65s     10.42.2.18   ip-10-0-1-236   <none>           <none>
test-deployment-56f74b48c7-c7xsg   1/1     Running   0          119s    10.42.2.18   ip-10-0-1-236   <none>           <none>
test-deployment-56f74b48c7-c7xsg   1/1     Terminating   0          7m4s    10.42.2.18   ip-10-0-1-236   <none>           <none>
test-deployment-56f74b48c7-p78q4   0/1     Pending       0          1s      <none>       <none>          <none>           <none>
test-deployment-56f74b48c7-p78q4   0/1     Pending       0          1s      <none>       ip-10-0-1-154   <none>           <none>
test-deployment-56f74b48c7-p78q4   0/1     ContainerCreating   0          1s      <none>       ip-10-0-1-154   <none>           <none>
test-deployment-56f74b48c7-p78q4   1/1     Running             0          35s     10.42.1.21   ip-10-0-1-154   <none>           <none>
test-deployment-56f74b48c7-c7xsg   1/1     Terminating         0          8m14s   10.42.2.18   ip-10-0-1-236   <none>           <none>
test-deployment-56f74b48c7-c7xsg   1/1     Terminating         0          8m14s   10.42.2.18   ip-10-0-1-236   <none>           <none>

(11) Verify that you can access the volume test-1 via the shell replacement pod under the mount point => data lost and I/O error

kubectl exec -it test-deployment-56f74b48c7-p78q4 -- /bin/sh
/ # ls /data
# nothing in /data
/ # echo test2 > /data/file2
/bin/sh: can't create file2: I/O error
# try to create a new file in /data, but return I/O error

Expected behavior

Data in volume keeps intact and be able to read/write the volume

Log or Support bundle

longhorn-support-bundle_46889877-205b-436b-a0d3-2c368ad219ea_2022-08-09T06-39-07Z.zip

Environment

  • Longhorn version: v1.3.1-rc2
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.22.9+k3s1
    • Number of management node in the cluster: 1
    • Number of worker node in the cluster: 3
  • Node config
    • OS type and version: ubuntu 20.04 t2.xlarge instance
    • CPU per node:
    • Memory per node:
    • Disk type(e.g. SSD/NVMe):
    • Network bandwidth between the nodes:
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): aws
  • Number of Longhorn volumes in the cluster:

Additional context

Add any other context about the problem here.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 25 (24 by maintainers)

Most upvoted comments

@derekbit the code is pushed the the image is phanle1010/longhorn-manager:0d951caf-dirty

Looks like Longhorn manager is trying to delete the wrong volumeAttachment object:

2022-08-10T01:51:40.706669148Z time="2022-08-10T01:51:40Z" level=info msg="longhorn-kubernetes-pod-controller: wait for volume attachment csi-7df4c6bbcbf77e525821ecf0ca71e7170960d428f045f9d47f147f6034dfe5d3 for pod test-deployment-56f74b48c7-w8bcw on downed node phan-v177-pool2-2c1045bd-4cgk6 to be deleted" controller=longhorn-kubernetes-pod node=phan-v177-pool2-2c1045bd-5b4h5

Since csi-7df4c6bbcbf77e525821ecf0ca71e7170960d428f045f9d47f147f6034dfe5d3 is not on phan-v177-pool2-2c1045bd-4cgk6

This commit might be the culprit. The commit was introduced by myself 😦

Good cooperation 👍!

Looks like Longhorn manager is trying to delete the wrong volumeAttachment object:

2022-08-10T01:51:40.706669148Z time="2022-08-10T01:51:40Z" level=info msg="longhorn-kubernetes-pod-controller: wait for volume attachment csi-7df4c6bbcbf77e525821ecf0ca71e7170960d428f045f9d47f147f6034dfe5d3 for pod test-deployment-56f74b48c7-w8bcw on downed node phan-v177-pool2-2c1045bd-4cgk6 to be deleted" controller=longhorn-kubernetes-pod node=phan-v177-pool2-2c1045bd-5b4h5

Since csi-7df4c6bbcbf77e525821ecf0ca71e7170960d428f045f9d47f147f6034dfe5d3 is not on phan-v177-pool2-2c1045bd-4cgk6

This commit might be the culprit. The commit was introduced my myself

@PhanLe1010 I think you’re right. After I reverted the commit, the data io error disappeared.

I can reproduce the issue at equinix’s VMs.

@derekbit https://github.com/longhorn/longhorn-manager/pull/1467 is pushed the the image is phanle1010/longhorn-manager:0d951caf-dirty

@PhanLe1010 Works like a charm!

Something is wrong with the control plane. It keeps detach/attach volume after the new pod becomes running thus the mount point inside the pod is obsoleted.

NAME     STATE      ROBUSTNESS   SCHEDULED   SIZE          NODE                             AGE
test-1   attached   healthy                  21474836480   phan-v177-pool2-2c1045bd-4cgk6   3m8s
test-1   attached   healthy                  21474836480   phan-v177-pool2-2c1045bd-4cgk6   3m9s
test-1   attached   unknown                  21474836480   phan-v177-pool2-2c1045bd-4cgk6   3m10s
test-1   attached   unknown                  21474836480   phan-v177-pool2-2c1045bd-4cgk6   8m17s
test-1   detaching   unknown                  21474836480                                    8m17s
test-1   detaching   unknown                  21474836480                                    8m18s
test-1   detached    unknown                  21474836480                                    8m18s
test-1   detached    unknown                  21474836480                                    9m15s
test-1   attaching   unknown                  21474836480   phan-v177-pool2-2c1045bd-5b4h5   9m15s
test-1   attached    healthy                  21474836480   phan-v177-pool2-2c1045bd-5b4h5   9m19s
test-1   attached    degraded                 21474836480   phan-v177-pool2-2c1045bd-5b4h5   9m24s
test-1   attached    degraded                 21474836480   phan-v177-pool2-2c1045bd-5b4h5   9m27s
test-1   attached    degraded                 21474836480   phan-v177-pool2-2c1045bd-5b4h5   9m29s
test-1   attached    degraded                 21474836480   phan-v177-pool2-2c1045bd-5b4h5   9m34s
test-1   attached    degraded                 21474836480   phan-v177-pool2-2c1045bd-5b4h5   9m39s
test-1   attached    degraded                 21474836480   phan-v177-pool2-2c1045bd-5b4h5   9m41s
test-1   detaching   unknown                  21474836480                                    9m41s
test-1   detaching   unknown                  21474836480                                    9m42s
test-1   detaching   unknown                  21474836480                                    9m42s
test-1   detached    unknown                  21474836480                                    9m43s
test-1   detached    unknown                  21474836480                                    9m56s
test-1   attaching   unknown                  21474836480   phan-v177-pool2-2c1045bd-5b4h5   9m56s
test-1   attached    unknown                  21474836480   phan-v177-pool2-2c1045bd-5b4h5   10m
test-1   attached    degraded                 21474836480   phan-v177-pool2-2c1045bd-5b4h5   10m
test-1   attached    degraded                 21474836480   phan-v177-pool2-2c1045bd-5b4h5   10m
test-1   detaching   unknown                  21474836480                                    10m
test-1   detaching   unknown                  21474836480                                    10m
test-1   detaching   unknown                  21474836480                                    10m
test-1   detached    unknown                  21474836480                                    10m
test-1   detached    unknown                  21474836480                                    10m
test-1   detached    unknown                  21474836480                                    10m
test-1   attaching   unknown                  21474836480   phan-v177-pool2-2c1045bd-5b4h5   10m
test-1   attached    unknown                  21474836480   phan-v177-pool2-2c1045bd-5b4h5   11m
test-1   attached    degraded                 21474836480   phan-v177-pool2-2c1045bd-5b4h5   11m
test-1   attached    healthy                  21474836480   phan-v177-pool2-2c1045bd-5b4h5   21m

To add here, I tried with Longhorn v1.3.1-rc2 also on a cluster created with rke2 k8s version v1.24.2, ubuntu nodes 20.4 and it worked for me. I was able to read/write the mount point.

@yangchiu Could you please elaborate your environment ?

Workaround of this kind of issue: Scale down and scale up the deployment.

I checked the above scenario with Longhorn v1.3.0 on a cluster created with rke1 K8s version v1.24.2 and it worked as expected.

I was able to read/write the mount point of the newly created pod.

@innobead This is a regression from 1.3.0.

not sure if k8s version matters but I use v1.22.9+k3s1

@innobead Rerun on v1.3.1-rc1, v1.3.1-rc1 also has this issue.