longhorn: Degraded volume not replicating
I have recently noticed I am losing replicas:
time="2022-03-01T15:21:45Z" level=debug msg="websocket: open" id=2872443530282525352 type=backingimages
time="2022-03-01T15:21:45Z" level=debug msg="websocket: open" id=7932402619095045958 type=engineimages
time="2022-03-01T15:21:45Z" level=debug msg="websocket: open" id=3710044462010597646 type=recurringjobs
time="2022-03-01T15:21:46Z" level=debug msg="websocket: open" id=3858639736328448269 type=settings
time="2022-03-01T15:21:46Z" level=debug msg="websocket: open" id=2559537779968376825 type=engineimages
time="2022-03-01T15:21:46Z" level=debug msg="websocket: open" id=19394914080775179 type=backingimages
time="2022-03-01T15:21:46Z" level=debug msg="websocket: open" id=4501139788975210766 type=recurringjobs
127.0.0.1 - - [01/Mar/2022:15:21:53 +0000] "GET /v1/backingimages? HTTP/1.1" 200 121 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:21:53Z" level=debug msg="Forwarding request to 10.42.0.148:9500"
127.0.0.1 - - [01/Mar/2022:15:21:53 +0000] "GET /v1/backuptargets HTTP/1.1" 200 384 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:21:53 +0000] "GET /v1/backupvolumes? HTTP/1.1" 200 78554 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:21:53 +0000] "GET /v1/backupvolumes? HTTP/1.1" 200 78554 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:21:53 +0000] "GET /v1/recurringjobs? HTTP/1.1" 200 363 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:21:53Z" level=debug msg="Forwarding request to 10.42.0.148:9500"
127.0.0.1 - - [01/Mar/2022:15:21:53 +0000] "POST /v1/volumes/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418?action=recurringJobList HTTP/1.1" 200 324 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:21:53 +0000] "POST /v1/volumes/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418?action=snapshotList HTTP/1.1" 200 104342 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:21:53 +0000] "POST /v1/volumes/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418?action=snapshotList HTTP/1.1" 200 104342 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:21:55Z" level=debug msg="Forwarding request to 10.42.0.148:9500"
127.0.0.1 - - [01/Mar/2022:15:21:55 +0000] "POST /v1/volumes/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418?action=snapshotList HTTP/1.1" 200 104342 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:21:55 +0000] "POST /v1/volumes/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418?action=snapshotList HTTP/1.1" 200 104342 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:21:57Z" level=debug msg="Forwarding request to 10.42.0.148:9500"
127.0.0.1 - - [01/Mar/2022:15:21:57 +0000] "POST /v1/volumes/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418?action=snapshotList HTTP/1.1" 200 104342 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:21:57 +0000] "POST /v1/volumes/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418?action=snapshotList HTTP/1.1" 200 104342 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:10Z" level=warning msg="Error syncing Longhorn engine" controller=longhorn-engine engine=longhorn-system/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-e-03a593af error="fail to sync engine for longhorn-system/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-e-03a593af: fail to start rebuild for pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-r-f03c5e05 of pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-e-03a593af: timed out waiting for the condition" node=surveily-srv-02
time="2022-03-01T15:22:10Z" level=info msg="Event(v1.ObjectReference{Kind:\"Engine\", Namespace:\"longhorn-system\", Name:\"pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-e-03a593af\", UID:\"f2ec0006-4f71-422f-a1f2-9ca9338aed43\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"236591567\", FieldPath:\"\"}): type: 'Normal' reason: 'Rebuilding' Start rebuilding replica pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-r-f03c5e05 with Address 10.42.3.25:10180 for normal engine pvc-49f6652a-6c9c-44d6-bd8b-97534499f418"
time="2022-03-01T15:22:10Z" level=error msg="Failed rebuilding of replica 10.42.3.25:10180" controller=longhorn-engine error="failed to add replica address='tcp://10.42.3.25:10180' to controller 'pvc-49f6652a-6c9c-44d6-bd8b-97534499f418': failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.2.2/longhorn [--url 10.42.0.152:10004 add tcp://10.42.3.25:10180], output , stderr, time=\"2022-03-01T15:22:10Z\" level=info msg=\"Adding replica tcp://10.42.3.25:10180 in WO mode\"\ntime=\"2022-03-01T15:22:10Z\" level=fatal msg=\"Error running add replica command: failed to create replica tcp://10.42.3.25:10180 for volume 10.42.0.152:10004: rpc error: code = Unknown desc = Replica must be closed, Can not add in state: open\"\n, error exit status 1" node=surveily-srv-02 volume=pvc-49f6652a-6c9c-44d6-bd8b-97534499f418
time="2022-03-01T15:22:10Z" level=info msg="Event(v1.ObjectReference{Kind:\"Engine\", Namespace:\"longhorn-system\", Name:\"pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-e-03a593af\", UID:\"f2ec0006-4f71-422f-a1f2-9ca9338aed43\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"236591567\", FieldPath:\"\"}): type: 'Warning' reason: 'FailedRebuilding' Failed rebuilding replica with Address 10.42.3.25:10180: failed to add replica address='tcp://10.42.3.25:10180' to controller 'pvc-49f6652a-6c9c-44d6-bd8b-97534499f418': failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.2.2/longhorn [--url 10.42.0.152:10004 add tcp://10.42.3.25:10180], output , stderr, time=\"2022-03-01T15:22:10Z\" level=info msg=\"Adding replica tcp://10.42.3.25:10180 in WO mode\"\ntime=\"2022-03-01T15:22:10Z\" level=fatal msg=\"Error running add replica command: failed to create replica tcp://10.42.3.25:10180 for volume 10.42.0.152:10004: rpc error: code = Unknown desc = Replica must be closed, Can not add in state: open\"\n, error exit status 1"
time="2022-03-01T15:22:10Z" level=info msg="Removed failed rebuilding replica 10.42.3.25:10180" controller=longhorn-engine node=surveily-srv-02 volume=pvc-49f6652a-6c9c-44d6-bd8b-97534499f418
time="2022-03-01T15:22:10Z" level=info msg="Engine is still in backoff for replica pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-r-f03c5e05 rebuild failure" controller=longhorn-engine node=surveily-srv-02 volume=pvc-49f6652a-6c9c-44d6-bd8b-97534499f418
time="2022-03-01T15:22:13Z" level=debug msg="Forwarding request to 10.42.0.148:9500"
127.0.0.1 - - [01/Mar/2022:15:22:13 +0000] "POST /v1/volumes/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418?action=snapshotList HTTP/1.1" 200 104342 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:13 +0000] "POST /v1/volumes/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418?action=snapshotList HTTP/1.1" 200 104342 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:16Z" level=debug msg="Forwarding request to 10.42.0.148:9500"
127.0.0.1 - - [01/Mar/2022:15:22:16 +0000] "POST /v1/volumes/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418?action=snapshotList HTTP/1.1" 200 104342 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:16 +0000] "POST /v1/volumes/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418?action=snapshotList HTTP/1.1" 200 104342 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:17Z" level=debug msg="Forwarding request to 10.42.0.148:9500"
127.0.0.1 - - [01/Mar/2022:15:22:17 +0000] "POST /v1/volumes/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418?action=snapshotList HTTP/1.1" 200 104342 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:17 +0000] "POST /v1/volumes/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418?action=snapshotList HTTP/1.1" 200 104342 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:23Z" level=debug msg="websocket: close 1000 (normal)" id=2334603118650555977 type=volumes
127.0.0.1 - - [01/Mar/2022:15:20:29 +0000] "GET /v1/ws/1s/volumes HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:23Z" level=debug msg="websocket: close 1000 (normal)" id=318776369863540127 type=nodes
127.0.0.1 - - [01/Mar/2022:15:20:29 +0000] "GET /v1/ws/1s/nodes HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:21:46 +0000] "GET /v1/ws/1s/settings HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:23Z" level=debug msg="websocket: close 1000 (normal)" id=3858639736328448269 type=settings
127.0.0.1 - - [01/Mar/2022:15:21:46 +0000] "GET /v1/ws/1s/backingimages HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:23Z" level=debug msg="websocket: close 1000 (normal)" id=19394914080775179 type=backingimages
time="2022-03-01T15:22:23Z" level=debug msg="websocket: close 1000 (normal)" id=2559537779968376825 type=engineimages
127.0.0.1 - - [01/Mar/2022:15:21:46 +0000] "GET /v1/ws/1s/engineimages HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:23 +0000] "GET /v1/backingimages? HTTP/1.1" 200 121 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:23Z" level=debug msg="Forwarding request to 10.42.0.148:9500"
127.0.0.1 - - [01/Mar/2022:15:22:23 +0000] "GET /v1/backupvolumes? HTTP/1.1" 200 78554 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:23 +0000] "GET /v1/backupvolumes? HTTP/1.1" 200 78554 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:23 +0000] "GET /v1/backuptargets HTTP/1.1" 200 384 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:23 +0000] "GET /v1/recurringjobs? HTTP/1.1" 200 363 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:27Z" level=debug msg="websocket: open" id=6196870698729171907 type=volumes
127.0.0.1 - - [01/Mar/2022:15:22:27 +0000] "GET /v1/backingimages? HTTP/1.1" 200 121 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:27Z" level=debug msg="Forwarding request to 10.42.0.148:9500"
127.0.0.1 - - [01/Mar/2022:15:22:27 +0000] "GET /v1/backupvolumes? HTTP/1.1" 200 78554 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:27 +0000] "GET /v1/backupvolumes? HTTP/1.1" 200 78554 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:28 +0000] "GET /v1/backuptargets HTTP/1.1" 200 384 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:28 +0000] "GET /v1/recurringjobs? HTTP/1.1" 200 363 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:28Z" level=debug msg="websocket: open" id=8293784591867210999 type=nodes
time="2022-03-01T15:22:28Z" level=debug msg="websocket: open" id=6024276221160928749 type=settings
time="2022-03-01T15:22:28Z" level=debug msg="websocket: open" id=5894274991168356469 type=backingimages
time="2022-03-01T15:22:28Z" level=debug msg="websocket: open" id=4457729998429290397 type=engineimages
127.0.0.1 - - [01/Mar/2022:15:22:29 +0000] "GET /v1/backingimages? HTTP/1.1" 200 121 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:29Z" level=debug msg="Forwarding request to 10.42.0.148:9500"
127.0.0.1 - - [01/Mar/2022:15:22:29 +0000] "GET /v1/backuptargets HTTP/1.1" 200 384 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:29 +0000] "GET /v1/recurringjobs? HTTP/1.1" 200 363 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:29 +0000] "GET /v1/backupvolumes? HTTP/1.1" 200 78554 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:29 +0000] "GET /v1/backupvolumes? HTTP/1.1" 200 78554 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:29 +0000] "POST /v1/volumes/pvc-fcddcefe-f528-40fd-8d36-0164de57965e?action=recurringJobList HTTP/1.1" 200 324 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:32Z" level=debug msg="websocket: close 1000 (normal)" id=6196870698729171907 type=volumes
time="2022-03-01T15:22:32Z" level=debug msg="websocket: close 1000 (normal)" id=8293784591867210999 type=nodes
127.0.0.1 - - [01/Mar/2022:15:22:28 +0000] "GET /v1/ws/1s/nodes HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:27 +0000] "GET /v1/ws/1s/volumes HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:32Z" level=debug msg="websocket: close 1000 (normal)" id=6024276221160928749 type=settings
127.0.0.1 - - [01/Mar/2022:15:22:28 +0000] "GET /v1/ws/1s/settings HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:28 +0000] "GET /v1/ws/1s/backingimages HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:33Z" level=debug msg="websocket: close 1000 (normal)" id=5894274991168356469 type=backingimages
time="2022-03-01T15:22:33Z" level=debug msg="websocket: close 1000 (normal)" id=4457729998429290397 type=engineimages
127.0.0.1 - - [01/Mar/2022:15:22:28 +0000] "GET /v1/ws/1s/engineimages HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:33 +0000] "GET /v1/backingimages? HTTP/1.1" 200 121 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:33Z" level=debug msg="Forwarding request to 10.42.0.148:9500"
127.0.0.1 - - [01/Mar/2022:15:22:33 +0000] "GET /v1/backuptargets HTTP/1.1" 200 384 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:33 +0000] "GET /v1/backupvolumes? HTTP/1.1" 200 78554 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:33 +0000] "GET /v1/backupvolumes? HTTP/1.1" 200 78554 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:33 +0000] "GET /v1/recurringjobs? HTTP/1.1" 200 363 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:21:46 +0000] "GET /v1/ws/1s/recurringjobs HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:34Z" level=debug msg="websocket: close 1000 (normal)" id=4501139788975210766 type=recurringjobs
time="2022-03-01T15:22:34Z" level=debug msg="websocket: open" id=3232817172614376329 type=nodes
127.0.0.1 - - [01/Mar/2022:15:22:34 +0000] "GET /v1/backingimages? HTTP/1.1" 200 121 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:34 +0000] "GET /v1/backuptargets HTTP/1.1" 200 384 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:34Z" level=debug msg="Forwarding request to 10.42.0.148:9500"
127.0.0.1 - - [01/Mar/2022:15:22:34 +0000] "GET /v1/recurringjobs? HTTP/1.1" 200 363 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:34 +0000] "GET /v1/backupvolumes? HTTP/1.1" 200 78554 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:34 +0000] "GET /v1/backupvolumes? HTTP/1.1" 200 78554 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:34Z" level=debug msg="websocket: open" id=3973331792788841527 type=settings
time="2022-03-01T15:22:34Z" level=debug msg="websocket: open" id=5332750216128134680 type=backingimages
time="2022-03-01T15:22:34Z" level=debug msg="websocket: open" id=7792437920833147928 type=backupvolumes
time="2022-03-01T15:22:38Z" level=debug msg="websocket: close 1000 (normal)" id=3232817172614376329 type=nodes
127.0.0.1 - - [01/Mar/2022:15:22:34 +0000] "GET /v1/ws/1s/nodes HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:38Z" level=debug msg="websocket: close 1000 (normal)" id=3973331792788841527 type=settings
127.0.0.1 - - [01/Mar/2022:15:22:34 +0000] "GET /v1/ws/1s/backingimages HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:38Z" level=debug msg="websocket: close 1000 (normal)" id=5332750216128134680 type=backingimages
time="2022-03-01T15:22:38Z" level=debug msg="websocket: close 1000 (normal)" id=7792437920833147928 type=backupvolumes
127.0.0.1 - - [01/Mar/2022:15:22:34 +0000] "GET /v1/ws/1s/backupvolumes HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:38Z" level=debug msg="websocket: open" id=1754343966137893263 type=recurringjobs
127.0.0.1 - - [01/Mar/2022:15:22:38 +0000] "GET /v1/backingimages? HTTP/1.1" 200 121 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
time="2022-03-01T15:22:38Z" level=debug msg="Forwarding request to 10.42.0.148:9500"
127.0.0.1 - - [01/Mar/2022:15:22:38 +0000] "GET /v1/backuptargets HTTP/1.1" 200 384 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:38 +0000] "GET /v1/recurringjobs? HTTP/1.1" 200 363 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:38 +0000] "GET /v1/backupvolumes? HTTP/1.1" 200 78554 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:38 +0000] "GET /v1/backupvolumes? HTTP/1.1" 200 78554 "http://localhost:45231/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
127.0.0.1 - - [01/Mar/2022:15:22:34 +0000] "GET /v1/ws/1s/settings HTTP/1.1" 200 0 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
E0301 15:22:40.280558 1 engine_controller.go:196] fail to sync engine for longhorn-system/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-e-03a593af: fail to start rebuild for pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-r-f03c5e05 of pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-e-03a593af: timed out waiting for the condition
time="2022-03-01T15:22:40Z" level=warning msg="Dropping Longhorn engine out of the queue" controller=longhorn-engine engine=longhorn-system/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-e-03a593af error="fail to sync engine for longhorn-system/pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-e-03a593af: fail to start rebuild for pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-r-f03c5e05 of pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-e-03a593af: timed out waiting for the condition" node=surveily-srv-02
time="2022-03-01T15:22:40Z" level=info msg="Event(v1.ObjectReference{Kind:\"Engine\", Namespace:\"longhorn-system\", Name:\"pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-e-03a593af\", UID:\"f2ec0006-4f71-422f-a1f2-9ca9338aed43\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"236591567\", FieldPath:\"\"}): type: 'Normal' reason: 'Rebuilding' Start rebuilding replica pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-r-f03c5e05 with Address 10.42.3.25:10180 for normal engine pvc-49f6652a-6c9c-44d6-bd8b-97534499f418"
time="2022-03-01T15:22:40Z" level=error msg="Failed rebuilding of replica 10.42.3.25:10180" controller=longhorn-engine error="failed to add replica address='tcp://10.42.3.25:10180' to controller 'pvc-49f6652a-6c9c-44d6-bd8b-97534499f418': failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.2.2/longhorn [--url 10.42.0.152:10004 add tcp://10.42.3.25:10180], output , stderr, time=\"2022-03-01T15:22:40Z\" level=info msg=\"Adding replica tcp://10.42.3.25:10180 in WO mode\"\ntime=\"2022-03-01T15:22:40Z\" level=fatal msg=\"Error running add replica command: failed to create replica tcp://10.42.3.25:10180 for volume 10.42.0.152:10004: rpc error: code = Unknown desc = Replica must be closed, Can not add in state: open\"\n, error exit status 1" node=surveily-srv-02 volume=pvc-49f6652a-6c9c-44d6-bd8b-97534499f418
time="2022-03-01T15:22:40Z" level=info msg="Event(v1.ObjectReference{Kind:\"Engine\", Namespace:\"longhorn-system\", Name:\"pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-e-03a593af\", UID:\"f2ec0006-4f71-422f-a1f2-9ca9338aed43\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"236591567\", FieldPath:\"\"}): type: 'Warning' reason: 'FailedRebuilding' Failed rebuilding replica with Address 10.42.3.25:10180: failed to add replica address='tcp://10.42.3.25:10180' to controller 'pvc-49f6652a-6c9c-44d6-bd8b-97534499f418': failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.2.2/longhorn [--url 10.42.0.152:10004 add tcp://10.42.3.25:10180], output , stderr, time=\"2022-03-01T15:22:40Z\" level=info msg=\"Adding replica tcp://10.42.3.25:10180 in WO mode\"\ntime=\"2022-03-01T15:22:40Z\" level=fatal msg=\"Error running add replica command: failed to create replica tcp://10.42.3.25:10180 for volume 10.42.0.152:10004: rpc error: code = Unknown desc = Replica must be closed, Can not add in state: open\"\n, error exit status 1"
time="2022-03-01T15:22:40Z" level=info msg="Removed failed rebuilding replica 10.42.3.25:10180" controller=longhorn-engine node=surveily-srv-02 volume=pvc-49f6652a-6c9c-44d6-bd8b-97534499f418
time="2022-03-01T15:22:40Z" level=info msg="Engine is still in backoff for replica pvc-49f6652a-6c9c-44d6-bd8b-97534499f418-r-f03c5e05 rebuild failure" controller=longhorn-engine node=surveily-srv-02 volume=pvc-49f6652a-6c9c-44d6-bd8b-97534499f418
time="2022-03-01T15:22:53Z" level=debug msg="websocket: open" id=1401131491442525535 type=volumes
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 32 (11 by maintainers)
PS. We are still restoring the data. Thank god for longhorn replicas and https://longhorn.io/docs/1.1.0/advanced-resources/data-recovery/export-from-replica/
Running
badsectorshas shown multiple issues with the device. Fortunately only 2 non-critical volumes are affected. Is it possible to blacklist a node so replicas are not propagated to it?I tried clicking the delete on a failing-to-rebuild replica as per image
Lets see what happens