etcd: `MemberList` doesn't work after adding a new member to one-node cluster

What happened?

etcdadm automates bringing up a cluster, and has a test where it brings up a cluster node-by-node. That test regressed when we updated the client from 3.4 to 3.5, seemingly a reintroduction of #9949

What did you expect to happen?

No regression

How can we reproduce it (as minimally and precisely as possible)?

https://github.com/kubernetes-sigs/etcdadm/pull/364 demonstrates the problem.

The failing test is relatively simple: https://github.com/kubernetes-sigs/etcdadm/blob/master/test/e2e/cluster_phases.sh

A failing run can be seen here

+ docker exec etcdadm-1 /etcdadm/etcdadm join phase membership https://172.17.0.2:2379/ --name etcdadm-1
time="2023-02-04T16:42:58Z" level=info msg="[membership] Checking if this member was added"
time="2023-02-04T16:42:58Z" level=info msg="[membership] Member was not added"
time="2023-02-04T16:42:58Z" level=info msg="Removing existing data dir \"/var/lib/etcd\""
time="2023-02-04T16:42:58Z" level=info msg="[membership] Adding member"
time="2023-02-04T16:42:58Z" level=info msg="[membership] Checking if member was started"
time="2023-02-04T16:42:58Z" level=info msg="[membership] Member was not started"
time="2023-02-04T16:42:58Z" level=info msg="[membership] Removing existing data dir \"/var/lib/etcd\""
+ docker exec etcdadm-1 /etcdadm/etcdadm join phase install https://172.17.0.2:2379/ --name etcdadm-1
2023/02/04 16:42:58 [install] Artifact not found in cache. Trying to fetch from upstream: https://github.com/coreos/etcd/releases/download
time="2023-02-04T16:42:58Z" level=info msg="[install] Downloading & installing etcd https://github.com/coreos/etcd/releases/download from 3.5.7 to /var/cache/etcdadm/etcd/v3.5.7\n"
time="2023-02-04T16:42:58Z" level=info msg="[install] downloading etcd from https://github.com/coreos/etcd/releases/download/v3.5.7/etcd-v3.5.7-linux-amd64.tar.gz to /var/cache/etcdadm/etcd/v3.5.7/etcd-v3.5.7-linux-amd64.tar.gz\n"
...
######################################################################## 100.0%
time="2023-02-04T16:42:59Z" level=info msg="[install] extracting etcd archive /var/cache/etcdadm/etcd/v3.5.7/etcd-v3.5.7-linux-amd64.tar.gz to /tmp/etcd560302720\n"
time="2023-02-04T16:43:00Z" level=info msg="[install] verifying etcd 3.5.7 is installed in /opt/bin/\n"
+ docker exec etcdadm-1 /etcdadm/etcdadm join phase configure https://172.17.0.2:2379/ --name etcdadm-1
time="2023-02-04T16:43:00Z" level=info msg="[membership] Checking if this member was added"
{"level":"warn","ts":"2023-02-04T16:43:05.267Z","logger":"etcd-client","caller":"v3@v3.5.7/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0004d0c40/172.17.0.2:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
time="2023-02-04T16:43:05Z" level=fatal msg="[membership] error listing members: context deadline exceeded"

So we have a single node cluster with etcadm-0. We add etcdadm-1 to the cluster but do not start it. On etcdadm-1 we do a MemberList query against etcdadm-0. That query times out, which looks like a regression on #9949

Anything else we need to know?

No response

Etcd version (please run commands below)

3.5.1

Etcd configuration (command line flags or environment variables)

No response

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

No response

Relevant log output

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 16 (16 by maintainers)

Commits related to this issue

Most upvoted comments

I am thinking probably we can update the signature from

func (c *cluster) MemberList(ctx context.Context) (*MemberListResponse, error) 

to

func (c *cluster) MemberList(ctx context.Context, opts ...OpOption) (*MemberListResponse, error) 

Users can use c.MemberList(ctx, clientv3.WithSerializable()) in this case.

It isn’t a breaking change anymore. The only minor concern is that it reuses the same Op as KV Range operation.

any comments? @ptabor @serathius @spzala