longhorn: [BUG] migration test cases could fail due to unexpected volume controllers and replicas status
Describe the bug (🐛 if you encounter this issue)
In Longhorn master or v1.5.x, the migration related test cases like test_migration_with_failed_replica, test_migration_with_unscheduled_replica, test_migration_with_failed_replica, test_migration_with_restore_volume and test_migration_with_rebuilding_replica could randomly fail due to the volume controllers and replicas status are not expected:
def wait_for_volume_migration_node(client, volume_name, node_id):
ready = False
for i in range(RETRY_COUNTS):
v = client.by_id_volume(volume_name)
engines = v.controllers
replicas = v.replicas
if len(engines) == 1 and len(replicas) == v.numberOfReplicas:
e = engines[0]
if e.endpoint != "":
assert e.hostId == node_id
ready = True
break
time.sleep(RETRY_INTERVAL)
> assert ready
E AssertionError
The volume status is:
{
"accessMode": "rwx",
"actions": {
"[activate](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=activate"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=activate),
"[attach](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=attach"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=attach),
"[cancelExpansion](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=cancelExpansion"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=cancelExpansion),
"[detach](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=detach"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=detach),
"[engineUpgrade](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=engineUpgrade"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=engineUpgrade),
"[expand](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=expand"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=expand),
"[pvCreate](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=pvCreate"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=pvCreate),
"[pvcCreate](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=pvcCreate"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=pvcCreate),
"[recurringJobAdd](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=recurringJobAdd"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=recurringJobAdd),
"[recurringJobDelete](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=recurringJobDelete"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=recurringJobDelete),
"[recurringJobList](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=recurringJobList"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=recurringJobList),
"[replicaRemove](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=replicaRemove"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=replicaRemove),
"[snapshotBackup](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotBackup"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotBackup),
"[snapshotCRCreate](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRCreate"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRCreate),
"[snapshotCRDelete](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRDelete"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRDelete),
"[snapshotCRGet](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRGet"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRGet),
"[snapshotCRList](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRList"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRList),
"[snapshotCreate](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCreate"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCreate),
"[snapshotDelete](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotDelete"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotDelete),
"[snapshotGet](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotGet"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotGet),
"[snapshotList](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotList"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotList),
"[snapshotPurge](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotPurge"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotPurge),
"[snapshotRevert](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotRevert"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotRevert),
"[trimFilesystem](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=trimFilesystem"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=trimFilesystem),
"[updateBackupCompressionMethod](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateBackupCompressionMethod"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateBackupCompressionMethod),
"[updateDataLocality](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateDataLocality"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateDataLocality),
"[updateOfflineReplicaRebuilding](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateOfflineReplicaRebuilding"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateOfflineReplicaRebuilding),
"[updateReplicaAutoBalance](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaAutoBalance"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaAutoBalance),
"[updateReplicaCount](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaCount"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaCount),
"[updateReplicaSoftAntiAffinity](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaSoftAntiAffinity"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaSoftAntiAffinity),
"[updateReplicaZoneSoftAntiAffinity](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaZoneSoftAntiAffinity"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaZoneSoftAntiAffinity),
"[updateSnapshotDataIntegrity](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateSnapshotDataIntegrity"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateSnapshotDataIntegrity),
"[updateUnmapMarkSnapChainRemoved](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateUnmapMarkSnapChainRemoved"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateUnmapMarkSnapChainRemoved),
},
"backendStoreDriver": "v1",
"backingImage": "",
"backupCompressionMethod": "lz4",
"backupStatus": [ ],
"cloneStatus": {
"snapshot": "",
"sourceVolume": "",
"state": "",
},
"conditions": {
"restore": {
"lastProbeTime": "",
"lastTransitionTime": "2023-06-28T11:38:06Z",
"message": "",
"reason": "",
"status": "False",
"type": "restore",
},
"scheduled": {
"lastProbeTime": "",
"lastTransitionTime": "2023-06-28T11:38:06Z",
"message": "",
"reason": "",
"status": "True",
"type": "scheduled",
},
"toomanysnapshots": {
"lastProbeTime": "",
"lastTransitionTime": "2023-06-28T11:38:06Z",
"message": "",
"reason": "",
"status": "False",
"type": "toomanysnapshots",
},
},
"controllers": [ 2 items
{
"actualSize": "4096",
"address": "10.42.2.8",
"currentImage": "longhornio/longhorn-engine:master-head",
"endpoint": "/dev/longhorn/longhorn-testvol-bl8klw",
"engineImage": "longhornio/longhorn-engine:master-head",
"hostId": "ip-10-0-1-146",
"instanceManagerName": "instance-manager-6c67701f85b0ae508d92d085b8b2c3ad",
"isExpanding": false,
"lastExpansionError": "",
"lastExpansionFailedAt": "",
"lastRestoredBackup": "",
"name": "longhorn-testvol-bl8klw-e-413a828c",
"requestedBackupRestore": "",
"running": true,
"size": "16777216",
"unmapMarkSnapChainRemovedEnabled": false,
},
{
"actualSize": "4096",
"address": "10.42.1.9",
"currentImage": "longhornio/longhorn-engine:master-head",
"endpoint": "/dev/longhorn/longhorn-testvol-bl8klw",
"engineImage": "longhornio/longhorn-engine:master-head",
"hostId": "ip-10-0-1-21",
"instanceManagerName": "instance-manager-408a4a130067c1351be9778cfa8b9ff7",
"isExpanding": false,
"lastExpansionError": "",
"lastExpansionFailedAt": "",
"lastRestoredBackup": "",
"name": "longhorn-testvol-bl8klw-e-d3c442ff",
"requestedBackupRestore": "",
"running": true,
"size": "16777216",
"unmapMarkSnapChainRemovedEnabled": false,
},
],
"created": "2023-06-28 11:38:05 +0000 UTC",
"currentImage": "longhornio/longhorn-engine:master-head",
"dataLocality": "disabled",
"dataSource": "",
"disableFrontend": false,
"diskSelector": [ ],
"encrypted": false,
"engineImage": "longhornio/longhorn-engine:master-head",
"fromBackup": "",
"frontend": "blockdev",
"id": ["longhorn-testvol-bl8klw"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw),
"kubernetesStatus": {
"lastPVCRefAt": "",
"lastPodRefAt": "",
"namespace": "",
"pvName": "",
"pvStatus": "",
"pvcName": "",
"workloadsStatus": null,
},
"lastAttachedBy": "",
"lastBackup": "",
"lastBackupAt": "",
"links": {
"self": ["…/v1/volumes/longhorn-testvol-bl8klw"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw),
},
"migratable": true,
"name": "longhorn-testvol-bl8klw",
"nodeSelector": [ ],
"numberOfReplicas": 3,
"offlineReplicaRebuilding": "disabled",
"offlineReplicaRebuildingRequired": false,
"purgeStatus": [ 4 items
{
"actions": null,
"error": "",
"isPurging": false,
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-6801502c",
"state": "",
},
{
"actions": null,
"error": "",
"isPurging": false,
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-35f42942",
"state": "",
},
{
"actions": null,
"error": "",
"isPurging": false,
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-5ca1080c",
"state": "",
},
{
"actions": null,
"error": "",
"isPurging": false,
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-71c1a3b0",
"state": "",
},
],
"ready": true,
"rebuildStatus": [ ],
"recurringJobSelector": null,
"replicaAutoBalance": "ignored",
"replicaSoftAntiAffinity": "ignored",
"replicaZoneSoftAntiAffinity": "ignored",
"replicas": [ 5 items
{
"address": "10.42.3.9",
"backendStoreDriver": "v1",
"currentImage": "longhornio/longhorn-engine:master-head",
"dataPath": "/var/lib/longhorn/replicas/longhorn-testvol-bl8klw-c13409ff",
"diskID": "b318893d-402d-49d3-abc5-6ed557895b25",
"diskPath": "/var/lib/longhorn/",
"engineImage": "longhornio/longhorn-engine:master-head",
"failedAt": "",
"hostId": "ip-10-0-1-39",
"instanceManagerName": "instance-manager-0b165eb6a49550ac97473a12c0045a78",
"mode": "RW",
"name": "longhorn-testvol-bl8klw-r-35f42942",
"running": true,
},
{
"address": "10.42.1.9",
"backendStoreDriver": "v1",
"currentImage": "longhornio/longhorn-engine:master-head",
"dataPath": "/var/lib/longhorn/replicas/longhorn-testvol-bl8klw-4be8b763",
"diskID": "7a05bd54-7bf8-4e75-abf2-05497610b825",
"diskPath": "/var/lib/longhorn/",
"engineImage": "longhornio/longhorn-engine:master-head",
"failedAt": "",
"hostId": "ip-10-0-1-21",
"instanceManagerName": "instance-manager-408a4a130067c1351be9778cfa8b9ff7",
"mode": "",
"name": "longhorn-testvol-bl8klw-r-5ca1080c",
"running": true,
},
{
"address": "10.42.1.9",
"backendStoreDriver": "v1",
"currentImage": "longhornio/longhorn-engine:master-head",
"dataPath": "/var/lib/longhorn/replicas/longhorn-testvol-bl8klw-4be8b763",
"diskID": "7a05bd54-7bf8-4e75-abf2-05497610b825",
"diskPath": "/var/lib/longhorn/",
"engineImage": "longhornio/longhorn-engine:master-head",
"failedAt": "",
"hostId": "ip-10-0-1-21",
"instanceManagerName": "instance-manager-408a4a130067c1351be9778cfa8b9ff7",
"mode": "RW",
"name": "longhorn-testvol-bl8klw-r-6801502c",
"running": true,
},
{
"address": "10.42.3.9",
"backendStoreDriver": "v1",
"currentImage": "longhornio/longhorn-engine:master-head",
"dataPath": "/var/lib/longhorn/replicas/longhorn-testvol-bl8klw-c13409ff",
"diskID": "b318893d-402d-49d3-abc5-6ed557895b25",
"diskPath": "/var/lib/longhorn/",
"engineImage": "longhornio/longhorn-engine:master-head",
"failedAt": "",
"hostId": "ip-10-0-1-39",
"instanceManagerName": "instance-manager-0b165eb6a49550ac97473a12c0045a78",
"mode": "",
"name": "longhorn-testvol-bl8klw-r-71c1a3b0",
"running": true,
},
{
"address": "",
"backendStoreDriver": "v1",
"currentImage": "",
"dataPath": "/var/lib/longhorn/replicas/longhorn-testvol-bl8klw-1c4c6b0d",
"diskID": "d2828bc3-5989-4e00-8249-123f20ddad9d",
"diskPath": "/var/lib/longhorn/",
"engineImage": "longhornio/longhorn-engine:master-head",
"failedAt": "2023-06-28T11:38:39Z",
"hostId": "ip-10-0-1-146",
"instanceManagerName": "",
"mode": "",
"name": "longhorn-testvol-bl8klw-r-a6b7051b",
"running": false,
},
],
"restoreInitiated": false,
"restoreRequired": false,
"restoreStatus": [ 4 items
{
"actions": null,
"backupURL": "",
"error": "",
"filename": "",
"isRestoring": false,
"lastRestored": "",
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-6801502c",
"state": "",
},
{
"actions": null,
"backupURL": "",
"error": "",
"filename": "",
"isRestoring": false,
"lastRestored": "",
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-35f42942",
"state": "",
},
{
"actions": null,
"backupURL": "",
"error": "",
"filename": "",
"isRestoring": false,
"lastRestored": "",
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-5ca1080c",
"state": "",
},
{
"actions": null,
"backupURL": "",
"error": "",
"filename": "",
"isRestoring": false,
"lastRestored": "",
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-71c1a3b0",
"state": "",
},
],
"restoreVolumeRecurringJob": "ignored",
"revisionCounterDisabled": false,
"robustness": "degraded",
"shareEndpoint": "",
"shareState": "",
"size": "16777216",
"snapshotDataIntegrity": "ignored",
"staleReplicaTimeout": 0,
"standby": false,
"state": "attached",
"type": "volume",
"unmapMarkSnapChainRemoved": "ignored",
"volumeAttachment": {
"attachments": {
"test-attachment-ticket-lhgbeu": {
"attachmentID": "test-attachment-ticket-lhgbeu",
"attachmentType": "csi-attacher",
"conditions": [
{
"lastProbeTime": "",
"lastTransitionTime": "2023-06-28T11:38:49Z",
"message": "The migrating attachment ticket is satisfied",
"reason": "",
"status": "True",
"type": "Satisfied",
},
],
"nodeID": "ip-10-0-1-21",
"parameters": {
"disableFrontend": "false",
"lastAttachedBy": "",
},
"satisfied": true,
},
},
"volume": "longhorn-testvol-bl8klw",
},
}
The length of controllers is not 1, and the length of replicas is not numberOfReplicas, so the test case failed.
To Reproduce
Run test case test_migration_with_*
Expected behavior
A clear and concise description of what you expected to happen.
Log or Support bundle
supportbundle_e4760442-f449-47a1-9bd1-dab5a00e97c5_2023-06-28T12-14-54Z.zip
Environment
- Longhorn version: master-head or v1.5.x-head
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
- Number of management node in the cluster:
- Number of worker node in the cluster:
- Node config
- OS type and version:
- CPU per node:
- Memory per node:
- Disk type(e.g. SSD/NVMe):
- Network bandwidth between the nodes:
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
- Number of Longhorn volumes in the cluster:
Additional context
Test results: https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-amd64/28/testReport/junit/tests/test_migration/test_migration_with_failed_replica/ https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-arm64/31/testReport/tests/test_migration/test_migration_with_rebuilding_replica/ https://ci.longhorn.io/job/public/job/master/job/sles/job/amd64/job/longhorn-tests-sles-amd64/533/testReport/tests/test_migration/test_migration_with_rebuilding_replica/ https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-amd64/27/testReport/tests/test_migration/test_migration_with_restore_volume_nfs_/ https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-amd64/24/testReport/tests/test_migration/test_migration_with_unscheduled_replica/ https://ci.longhorn.io/job/public/job/master/job/sles/job/amd64/job/longhorn-tests-sles-amd64/526/testReport/tests/test_migration/test_migration_with_rebuilding_replica/ https://ci.longhorn.io/job/public/job/master/job/sles/job/arm64/job/longhorn-tests-sles-arm64/522/testReport/tests/test_migration/test_migration_with_unscheduled_replica/
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 22 (19 by maintainers)
Ran the Test
test_migration_with_unscheduled_replicaandtest_migration_with_failed_replica20 times each with @PhanLe1010’s dev image, they passed on 1.5.x.Update:
test_migration_with_rebuilding_replicapassed 5 times. Rest we can check once the PR is merged probably.Root Cause Analysis
From almost all support bundles collected when the issue happen, I see that the migration is blocked by checkMigratingEngineSyncSnapshots at this step That step checks and waits for the snapshot chain in old engine is the same as inside new engine. However, sometime this condition is never met because the snapshot creation time mismatch like this:
The snapshot created timestamp is an info fetch from one of the RW replica. This is the time that the snapshot file is created on that replica’s disk. So it is possible that they are different on different RW replicas. When each engine fetch them from a different RW replica, the creation timestamp might be different.
Proposal
The role of the function
checkMigratingEngineSyncSnapshotsis to wait for the snapshot chain of the new engine (which is empty at beginning) to be populated so that we don’t accidentally delete snapshot CRs. We should modify thecheckMigratingEngineSyncSnapshotsso that it only check and wait for all snapshot names in the old engine to appear in the new engine. The snapshot creation timestamp can be different and it is ok, we should not block the migration flow.Yes, tested on master-head (longhorn-manager 4a57a72) and v1.5.x-head (longhorn-manager a9bf977), this migration test plan still works.
Verified passed on v1.5.x-head (longhorn-manager ba2d3d1) by running test_migration_with_unscheduled_replica, test_migration_with_failed_replica, test_migration_with_restore_volume and test_migration_with_rebuilding_replica for 10 times.
All test cases passed: https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-amd64/33/
Thanks @yangchiu, The migration tests work on v1.5.x-head on my set up too.
@khushboo-rancher Could you help to double check that this test plan still passed https://github.com/longhorn/longhorn/issues/5992#issuecomment-1563873360 ?
Verified passed on master-head (longhorn-manager 9bc7e07) by running
test_migration_with_unscheduled_replica,test_migration_with_failed_replica,test_migration_with_restore_volumeandtest_migration_with_rebuilding_replicafor 10 times.All test cases passed: https://ci.longhorn.io/job/private/job/longhorn-tests-regression/4384/
Waiting for v1.5.x-head test result now.
You are correct @innobead It should not happen in 1.4.x and 1.3.x.
This issue is a side effect of this https://github.com/longhorn/longhorn-manager/pull/1922
Probably not. It’s stuck in the same state forever: http://34.228.2.191:30007/#/volume/longhorn-testvol-bl8klw http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw
Hi @innobead : I observed that the test case
test_migration_with_failed_replicafailed on1.5.0-RC3for the first time. I didn’t encounter this failure in my personal test records for 1.4.x and 1.3.x.