etcd: Failed to restore etcd from a snapshot due to resolving peer URL failure

I have done some procedures for this:

  1. copy a snapshot file to someplace named etcd-snapshot.db
  2. scale the statefulset to 0
  3. start a static pod with etcdctl and mount the pvc used by etcd members
  4. execute the restore command:
etcdctl snapshot restore --skip-hash-check etcd-snapshot.db --initial-cluster=apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.apisix.svc.cluster.local:2380,apisix-etcd-1=http://apisix-etcd-1.apisix-etcd-headless.apisix.svc.cluster.local:2380,apisix-etcd-2=http://apisix-etcd-2.apisix-etcd-headless.apisix.svc.cluster.local:2380 --initial-cluster-token=etcd-cluster-k8s --initial-advertise-peer-urls=http://apisix-etcd-2.apisix-etcd-headless.apisix.svc.cluster.local:2380 --name apisix-etcd-2  --data-dir=/opt/nfsdata/apisix-data-apisix-etcd-2-pvc-f8ef09a4-e8f2-404d-8d14-63b905e324be/data

but the command has a problem: the pods have been shut down,therefore there is no pod-domain exists,get errors:

{"level":"warn","ts":1663053636.7199378,"caller":"netutil/netutil.go:121","msg":"failed to resolve URL Host","url":"http://apisix-etcd-2.apisix-etcd-headless.apisix.svc.cluster.local:2380","host":"apisix-etcd-2.apisix-etcd-headless.apisix.svc.cluster.local:2380","retry-interval":1,"error":"lookup apisix-etcd-2.apisix-etcd-headless.apisix.svc.cluster.local on 192.168.0.2:53: no such host"}

If I restore without extra options:

etcdctl snapshot restore --skip-hash-check etcd-snapshot.db --data-dir=/opt/nfsdata/apisix-data-apisix-etcd-2-pvc-f8ef09a4-e8f2-404d-8d14-63b905e324be/data

All things are OK except that the node start as a single-node,etcdctl member list only shows itself 😦

So, how should I restore ETCD deployed in K8S, thank you in advance.

etcd Version: 3.4.16 Git SHA: d19fbe541 Go Version: go1.12.17 Go OS/Arch: linux/amd64

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 20 (15 by maintainers)

Most upvoted comments

@hasethuraman In your steps,the restore command only with one option “–data-dir”?If so,each member will start as single-node?If not,using an option(e.g. initial-cluster ) with a domain will cause the problem as title describled.