etcd: `MemberList` doesn't work after adding a new member to one-node cluster
What happened?
etcdadm automates bringing up a cluster, and has a test where it brings up a cluster node-by-node. That test regressed when we updated the client from 3.4 to 3.5, seemingly a reintroduction of #9949
What did you expect to happen?
No regression
How can we reproduce it (as minimally and precisely as possible)?
https://github.com/kubernetes-sigs/etcdadm/pull/364 demonstrates the problem.
The failing test is relatively simple: https://github.com/kubernetes-sigs/etcdadm/blob/master/test/e2e/cluster_phases.sh
A failing run can be seen here
+ docker exec etcdadm-1 /etcdadm/etcdadm join phase membership https://172.17.0.2:2379/ --name etcdadm-1
time="2023-02-04T16:42:58Z" level=info msg="[membership] Checking if this member was added"
time="2023-02-04T16:42:58Z" level=info msg="[membership] Member was not added"
time="2023-02-04T16:42:58Z" level=info msg="Removing existing data dir \"/var/lib/etcd\""
time="2023-02-04T16:42:58Z" level=info msg="[membership] Adding member"
time="2023-02-04T16:42:58Z" level=info msg="[membership] Checking if member was started"
time="2023-02-04T16:42:58Z" level=info msg="[membership] Member was not started"
time="2023-02-04T16:42:58Z" level=info msg="[membership] Removing existing data dir \"/var/lib/etcd\""
+ docker exec etcdadm-1 /etcdadm/etcdadm join phase install https://172.17.0.2:2379/ --name etcdadm-1
2023/02/04 16:42:58 [install] Artifact not found in cache. Trying to fetch from upstream: https://github.com/coreos/etcd/releases/download
time="2023-02-04T16:42:58Z" level=info msg="[install] Downloading & installing etcd https://github.com/coreos/etcd/releases/download from 3.5.7 to /var/cache/etcdadm/etcd/v3.5.7\n"
time="2023-02-04T16:42:58Z" level=info msg="[install] downloading etcd from https://github.com/coreos/etcd/releases/download/v3.5.7/etcd-v3.5.7-linux-amd64.tar.gz to /var/cache/etcdadm/etcd/v3.5.7/etcd-v3.5.7-linux-amd64.tar.gz\n"
...
######################################################################## 100.0%
time="2023-02-04T16:42:59Z" level=info msg="[install] extracting etcd archive /var/cache/etcdadm/etcd/v3.5.7/etcd-v3.5.7-linux-amd64.tar.gz to /tmp/etcd560302720\n"
time="2023-02-04T16:43:00Z" level=info msg="[install] verifying etcd 3.5.7 is installed in /opt/bin/\n"
+ docker exec etcdadm-1 /etcdadm/etcdadm join phase configure https://172.17.0.2:2379/ --name etcdadm-1
time="2023-02-04T16:43:00Z" level=info msg="[membership] Checking if this member was added"
{"level":"warn","ts":"2023-02-04T16:43:05.267Z","logger":"etcd-client","caller":"v3@v3.5.7/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0004d0c40/172.17.0.2:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
time="2023-02-04T16:43:05Z" level=fatal msg="[membership] error listing members: context deadline exceeded"
So we have a single node cluster with etcadm-0. We add etcdadm-1 to the cluster but do not start it. On etcdadm-1 we do a MemberList query against etcdadm-0. That query times out, which looks like a regression on #9949
Anything else we need to know?
No response
Etcd version (please run commands below)
3.5.1
Etcd configuration (command line flags or environment variables)
No response
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
No response
Relevant log output
No response
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 16 (16 by maintainers)
Commits related to this issue
- Add non-quorum MemberList operation This is needed for etcdadm to grow the cluster when it does not have quorum. See https://github.com/etcd-io/etcd/issues/15243 for background. — committed to justinsb/etcdadm by justinsb a year ago
- Add non-quorum MemberList operation This is needed for etcdadm to grow the cluster when it does not have quorum. See https://github.com/etcd-io/etcd/issues/15243 for background. — committed to justinsb/etcdadm by justinsb a year ago
I am thinking probably we can update the signature from
to
Users can use
c.MemberList(ctx, clientv3.WithSerializable())
in this case.It isn’t a breaking change anymore. The only minor concern is that it reuses the same Op as KV Range operation.
any comments? @ptabor @serathius @spzala