longhorn: [BUG] Volume could not be remounted after engine process killed
Describe the bug
In test case test_volume_reattach_after_engine_sigkill:
"""
[HA] Test if the volume can be reattached after using SIGKILL
to crash the engine process
1. Create StorageClass and StatefulSet.
2. Write random data to the pod and get the md5sum.
3. Crash the engine process by SIGKILL in the engine manager.
4. Wait for volume to `faulted`, then `healthy`.
5. Wait for K8s to terminate the pod and statefulset to bring pod to
`Pending`, then `Running`.
6. Check volume path exist in the pod.
7. Check md5sum of the data in the pod.
8. Check new data written to the volume is successful.
"""
After engine process killed in step 3, and volume recovered to healthy in step 4, the statefulset pod could not be recreated to remount the volume:
https://ci.longhorn.io/job/public/job/v1.6.x/job/v1.6.x-longhorn-upgrade-tests-sles-arm64/8/testReport/junit/tests/test_ha/test_volume_reattach_after_engine_sigkill/ https://ci.longhorn.io/job/public/job/master/job/sles/job/arm64/job/longhorn-2-stage-upgrade-tests-sles-arm64/421/testReport/junit/tests/test_ha/test_volume_reattach_after_engine_sigkill/
client = <longhorn.Client object at 0xffff7d497af0>
core_api = <kubernetes.client.api.core_v1_api.CoreV1Api object at 0xffff7d6c35e0>
storage_class = {'allowVolumeExpansion': True, 'apiVersion': 'storage.k8s.io/v1', 'kind': 'StorageClass', 'metadata': {'name': 'longhorn-test'}, ...}
sts_name = 'longhorn-teststs-hhecdb'
statefulset = {'apiVersion': 'apps/v1', 'kind': 'StatefulSet', 'metadata': {'name': 'longhorn-teststs-hhecdb', 'namespace': 'default...usybox:1.34.0', 'imagePullPolicy': 'IfNotPresent', 'name': 'sleep', ...}], 'terminationGracePeriodSeconds': 10}}, ...}}
def test_volume_reattach_after_engine_sigkill(client, core_api, storage_class, sts_name, statefulset): # NOQA
"""
[HA] Test if the volume can be reattached after using SIGKILL
to crash the engine process
1. Create StorageClass and StatefulSet.
2. Write random data to the pod and get the md5sum.
3. Crash the engine process by SIGKILL in the engine manager.
4. Wait for volume to `faulted`, then `healthy`.
5. Wait for K8s to terminate the pod and statefulset to bring pod to
`Pending`, then `Running`.
6. Check volume path exist in the pod.
7. Check md5sum of the data in the pod.
8. Check new data written to the volume is successful.
"""
vol_name, pod_name, md5sum = \
common.prepare_statefulset_with_data_in_mb(
client, core_api, statefulset, sts_name, storage_class)
crash_engine_process_with_sigkill(client, core_api, vol_name)
> wait_pod_for_remount_request(client, core_api, vol_name, pod_name, md5sum)
test_ha.py:1863:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_ha.py:625: in wait_pod_for_remount_request
wait_for_pod_remount(core_api, pod_name)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
core_api = <kubernetes.client.api.core_v1_api.CoreV1Api object at 0xffff7d6c35e0>
pod_name = 'longhorn-teststs-hhecdb-0', chk_path = '/data/lost+found'
def wait_for_pod_remount(core_api, pod_name, chk_path="/data/lost+found"):
check_command = [
'/bin/sh',
'-c',
'ls ' + chk_path
]
ready = False
for i in range(RETRY_EXEC_COUNTS):
try:
output = stream(core_api.connect_get_namespaced_pod_exec,
pod_name,
'default',
command=check_command,
stderr=True, stdin=False,
stdout=True, tty=False)
if "Input/output error" not in output:
ready = True
break
except Exception:
pass
if ready:
break
time.sleep(RETRY_EXEC_INTERVAL)
> assert ready
E AssertionError
common.py:4932: AssertionError
Seems longhorn-manager is flooded with Volume request remount messages:
2024-01-23T00:54:07.193269120Z time="2024-01-23T00:54:07Z" level=info msg="Volume request remount at 2024-01-23T00:54:07Z due to engine detected read-only filesystem" func="controller.(*VolumeController).requestRemountIfFileSystemReadOnly" file="volume_controller.go:1482" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=ip-10-0-2-138 owner=ip-10-0-2-138 state=attached volume=pvc-c0387c67-ce3e-4ce0-a63b-465eeda79781
2024-01-23T00:54:07.193385769Z time="2024-01-23T00:54:07Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-c0387c67-ce3e-4ce0-a63b-465eeda79781\", UID:\"6c272859-9b99-4a8c-b823-532228049b5d\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"153277\", FieldPath:\"\"}): type: 'Normal' reason: 'Remount' Volume pvc-c0387c67-ce3e-4ce0-a63b-465eeda79781 requested remount at 2024-01-23T00:54:07Z due to engine detected read-only filesystem" func="record.(*eventBroadcasterImpl).StartLogging.func1" file="event.go:298"
But the statefulset pod is never recreated and stuck in Input/output error:
# kubectl get pods
NAME READY STATUS RESTARTS AGE
longhorn-test-minio 1/1 Running 0 18h
longhorn-test-nfs 1/1 Running 1 (18h ago) 18h
longhorn-test 2/2 Running 0 18h
longhorn-teststs-yoij4o-0 1/1 Running 0 13h
# kubectl exec -it longhorn-teststs-yoij4o-0 -- /bin/sh
/data # echo ok > ok
/bin/sh: can't create ok: Input/output error
https://ci.longhorn.io/job/private/job/longhorn-tests-regression/6041/console
To Reproduce
Run test case test_volume_reattach_after_engine_sigkill.
The reproducibility is about ~ 1/300.
Expected behavior
Support bundle for troubleshooting
supportbundle_846e2484-667c-4742-b6ce-dbf5fd40a0df_2024-01-23T00-54-01Z.zip
Environment
- Longhorn version: master-head, v1.6.x-head
- Impacted volume (PV):
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.28.4+k3s1
- Number of control plane nodes in the cluster:
- Number of worker nodes in the cluster:
- Node config
- OS type and version:
- Kernel version:
- CPU per node:
- Memory per node:
- Disk type (e.g. SSD/NVMe/HDD):
- Network bandwidth between the nodes (Gbps):
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
- Number of Longhorn volumes in the cluster:
Additional context
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Comments: 19 (18 by maintainers)
I think I confirmed the issue It is like what @shuo-wu said here
Take this case for example
The problem here is that
Our Pod deletion is executed in kubernetes_pod_controller.go and it reconcile pods every 30 secs If request remount every 30 sec just overlaps with reconcile pods every 30 secs, then it won’t delete the pod because we check
podStartTime.Before(remountRequestedAt) && timeNow.After(remountRequestedAt.Add(delayDuration))cc @shuo-wu @innobead @yangchiu
I am fine with either. BTW, requeuing an object will add a backoff. There won’t be too much overhead.
Verified passed on v1.6.x-head (longhorn-manager 706870) by running test case
test_volume_reattach_after_engine_sigkill300 times.They all passed: https://ci.longhorn.io/job/private/job/longhorn-tests-regression/6102/ https://ci.longhorn.io/job/private/job/longhorn-tests-regression/6103/ https://ci.longhorn.io/job/private/job/longhorn-tests-regression/6105/
I think what we can do here is to not update the request remount every time when we detect read-only condition Maybe we update it every 1 mins so pod reconciliation could still have the chance to delete the pod
Cleanup filesystem readonly condition after set request remount won’t work, because we periodically sync the condition from instance-manager every 30 secs.
I think it is possible that the
v.Status.RemountRequestedAtkeeps updating every reconcile I will do a quick test