longhorn: [BUG] Can't replicate old volumes

Hello,

Since I added kubernetes nodes, it is impossible for me to replicate the volumes that were already present before the servers were added.

Steps to reproduce the behavior:

  • Add new node and watch logs

Expected behavior A clear and concise description of what you expected to happen.

Log

Failed rebuilding replica with Address 10.42.2.7:10000: failed to add replica address='tcp://10.42.2.7:10000' to controller 'pvc-96b98781-c580-48f5-9736-520fd72dcc6d': failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.1.2/longhorn [--url 10.42.0.9:10008 add tcp://10.42.2.7:10000], output , stderr, time="2021-07-29T00:18:13Z" level=info msg="Adding replica tcp://10.42.2.7:10000 in WO mode" time="2021-07-29T00:18:13Z" level=info msg="Using replica tcp://10.42.0.8:10225 as the source for rebuild " time="2021-07-29T00:18:13Z" level=info msg="Using replica tcp://10.42.2.7:10000 as the target for rebuild " time="2021-07-29T00:22:46Z" level=fatal msg="Error running add replica command: failed to sync files [{FromFileName:volume-snap-s10m-f209a415.img ToFileName:volume-snap-s10m-f209a415.img ActualSize:24576} {FromFileName:volume-snap-s10m-f209a415.img.meta ToFileName:volume-snap-s10m-f209a415.img.meta ActualSize:0} {FromFileName:volume-snap-b6pm-8d8a1740.img ToFileName:volume-snap-b6pm-8d8a1740.img ActualSize:0} {FromFileName:volume-snap-b6pm-8d8a1740.img.meta ToFileName:volume-snap-b6pm-8d8a1740.img.meta ActualSize:0} {FromFileName:volume-snap-b6am-494e3069.img ToFileName:volume-snap-b6am-494e3069.img ActualSize:16384} {FromFileName:volume-snap-b6am-494e3069.img.meta ToFileName:volume-snap-b6am-494e3069.img.meta ActualSize:0} {FromFileName:volume-snap-s6h-3f3cb283.img ToFileName:volume-snap-s6h-3f3cb283.img ActualSize:159744} {FromFileName:volume-snap-s6h-3f3cb283.img.meta ToFileName:volume-snap-s6h-3f3cb283.img.meta ActualSize:0} {FromFileName:volume-snap-b12pm-64e182d0.img ToFileName:volume-snap-b12pm-64e182d0.img ActualSize:0} {FromFileName:volume-snap-b12pm-64e182d0.img.meta ToFileName:volume-snap-b12pm-64e182d0.img.meta ActualSize:0} {FromFileName:volume-snap-b6am-bfdb256e.img ToFileName:volume-snap-b6am-bfdb256e.img ActualSize:16384} {FromFileName:volume-snap-b6am-bfdb256e.img.meta ToFileName:volume-snap-b6am-bfdb256e.img.meta ActualSize:0} {FromFileName:volume-snap-b12pm-23a8bca2.img ToFileName:volume-snap-b12pm-23a8bca2.img ActualSize:0} {FromFileName:volume-snap-b12pm-23a8bca2.img.meta ToFileName:volume-snap-b12pm-23a8bca2.img.meta ActualSize:0} {FromFileName:volume-snap-s10m-5f44c659.img ToFileName:volume-snap-s10m-5f44c659.img ActualSize:40960} {FromFileName:volume-snap-s10m-5f44c659.img.meta ToFileName:volume-snap-s10m-5f44c659.img.meta ActualSize:0} {FromFileName:volume-snap-b6pm-b92bd07a.img ToFileName:volume-snap-b6pm-b92bd07a.img ActualSize:16384} {FromFileName:volume-snap-b6pm-b92bd07a.img.meta ToFileName:volume-snap-b6pm-b92bd07a.img.meta ActualSize:0} {FromFileName:volume-snap-s10m-cf162f77.img ToFileName:volume-snap-s10m-cf162f77.img ActualSize:966656} {FromFileName:volume-snap-s10m-cf162f77.img.meta ToFileName:volume-snap-s10m-cf162f77.img.meta ActualSize:0} {FromFileName:volume-snap-s10m-41e3f3bb.img ToFileName:volume-snap-s10m-41e3f3bb.img ActualSize:151552} {FromFileName:volume-snap-s10m-41e3f3bb.img.meta ToFileName:volume-snap-s10m-41e3f3bb.img.meta ActualSize:0} {FromFileName:volume-snap-s10m-f0017dc3.img ToFileName:volume-snap-s10m-f0017dc3.img ActualSize:6991872} {FromFileName:volume-snap-s10m-f0017dc3.img.meta ToFileName:volume-snap-s10m-f0017dc3.img.meta ActualSize:0} {FromFileName:volume-snap-b6am-d786cd13.img ToFileName:volume-snap-b6am-d786cd13.img ActualSize:1196032} {FromFileName:volume-snap-b6am-d786cd13.img.meta ToFileName:volume-snap-b6am-d786cd13.img.meta ActualSize:0} {FromFileName:volume-snap-s10m-3e4bd76c.img ToFileName:volume-snap-s10m-3e4bd76c.img ActualSize:163840} {FromFileName:volume-snap-s10m-3e4bd76c.img.meta ToFileName:volume-snap-s10m-3e4bd76c.img.meta ActualSize:0} {FromFileName:volume-snap-s6h-a807df37.img ToFileName:volume-snap-s6h-a807df37.img ActualSize:4096} {FromFileName:volume-snap-s6h-a807df37.img.meta ToFileName:volume-snap-s6h-a807df37.img.meta ActualSize:0} {FromFileName:volume-snap-s6h-f62ff3a1.img ToFileName:volume-snap-s6h-f62ff3a1.img ActualSize:32768} {FromFileName:volume-snap-s6h-f62ff3a1.img.meta ToFileName:volume-snap-s6h-f62ff3a1.img.meta ActualSize:0} {FromFileName:volume-snap-s10m-7c74e4fe.img ToFileName:volume-snap-s10m-7c74e4fe.img ActualSize:135168} {FromFileName:volume-snap-s10m-7c74e4fe.img.meta ToFileName:volume-snap-s10m-7c74e4fe.img.meta ActualSize:0} {FromFileName:volume-snap-s10m-c2a1bac3.img ToFileName:volume-snap-s10m-c2a1bac3.img ActualSize:282624} {FromFileName:volume-snap-s10m-c2a1bac3.img.meta ToFileName:volume-snap-s10m-c2a1bac3.img.meta ActualSize:0} {FromFileName:volume-snap-s10m-9fe3ae71.img ToFileName:volume-snap-s10m-9fe3ae71.img ActualSize:45056} {FromFileName:volume-snap-s10m-9fe3ae71.img.meta ToFileName:volume-snap-s10m-9fe3ae71.img.meta ActualSize:0} {FromFileName:volume-snap-s10m-73e6ddc8.img ToFileName:volume-snap-s10m-73e6ddc8.img ActualSize:49152} {FromFileName:volume-snap-s10m-73e6ddc8.img.meta ToFileName:volume-snap-s10m-73e6ddc8.img.meta ActualSize:0} {FromFileName:volume-snap-s6h-510e4966.img ToFileName:volume-snap-s6h-510e4966.img ActualSize:360169472} {FromFileName:volume-snap-s6h-510e4966.img.meta ToFileName:volume-snap-s6h-510e4966.img.meta ActualSize:0} {FromFileName:volume-snap-9057b3b1-e0d2-4d55-9dac-61787e889040.img ToFileName:volume-snap-9057b3b1-e0d2-4d55-9dac-61787e889040.img ActualSize:0} {FromFileName:volume-snap-9057b3b1-e0d2-4d55-9dac-61787e889040.img.meta ToFileName:volume-snap-9057b3b1-e0d2-4d55-9dac-61787e889040.img.meta ActualSize:0}] from tcp://10.42.0.8:10225: rpc error: code = Unavailable desc = transport is closing" , error exit status 1
fail to list snapshot: cannot get client for volume pvc-96b98781-c580-48f5-9736-520fd72dcc6d: engine is not running
Failed rebuilding replica with Address 10.42.1.8:10000: failed to add replica address='tcp://10.42.1.8:10000' to controller 'pvc-d62d4874-0769-4603-9b29-ce89fa4c57fe': failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.1.2/longhorn [--url 10.42.0.9:10012 add tcp://10.42.1.8:10000], output , stderr, time="2021-07-29T01:10:17Z" level=info msg="Adding replica tcp://10.42.1.8:10000 in WO mode" time="2021-07-29T01:10:17Z" level=info msg="Using replica tcp://10.42.0.8:10165 as the source for rebuild " time="2021-07-29T01:10:17Z" level=info msg="Using replica tcp://10.42.1.8:10000 as the target for rebuild " time="2021-07-29T01:10:22Z" level=fatal msg="Error running add replica command: failed to sync files [{FromFileName:volume-snap-58554d15-f53c-42de-8df6-a6d7fda7ab49.img ToFileName:volume-snap-58554d15-f53c-42de-8df6-a6d7fda7ab49.img ActualSize:32768} {FromFileName:volume-snap-58554d15-f53c-42de-8df6-a6d7fda7ab49.img.meta ToFileName:volume-snap-58554d15-f53c-42de-8df6-a6d7fda7ab49.img.meta ActualSize:0} {FromFileName:volume-snap-65c29715-7ee3-45c2-9710-94cb2f410a40.img ToFileName:volume-snap-65c29715-7ee3-45c2-9710-94cb2f410a40.img ActualSize:583028736} {FromFileName:volume-snap-65c29715-7ee3-45c2-9710-94cb2f410a40.img.meta ToFileName:volume-snap-65c29715-7ee3-45c2-9710-94cb2f410a40.img.meta ActualSize:0}] from tcp://10.42.0.8:10165: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil" , error exit status 1

Environment:

  • Longhorn version: v1.1.2
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): helm
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: RKE v1.20.8
    • Number of management node in the cluster: 3
    • Number of worker node in the cluster: 1
  • Node config
    • OS type and version: Ubuntu 21.04
    • CPU per node: 12CPU
    • Memory per node: 32Go
    • Disk type(e.g. SSD/NVMe): NVMe
    • Network bandwidth between the nodes: 100 Mbps
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Bare-metal
  • Number of Longhorn volumes in the cluster: 20

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 40 (16 by maintainers)

Most upvoted comments

After adding the IP_AUTODETECTION_METHOD=<name_interface> option in the daemonset canal, everything seems to work fine

Stay stunned

email sended