etcd: Duplicate names in `ETCD_INITIAL_CLUSTER` not handled correctly

What happened?

If you don’t pass a --name argument to your etcd processes, they will all have the name default and the cluster will operate normally. However, when you add a member, the generated ETCD_INITIAL_CLUSTER variable will have multiple entries with the name “default”. When this environment variable is used, etcd will parse these into a mapping under a single key (“default”) with multiple URLs, and create a single member. See https://github.com/etcd-io/etcd/blob/63a1cc3fe40bace6898289dec35a9aad05163889/server/etcdserver/api/membership/cluster.go#L83-L86

This leads to the confusing error message “member count is unequal”. The documentation on https://etcd.io/docs/v3.5/op-guide/runtime-configuration/ mentions this failure, but the situation is different.

What did you expect to happen?

Either

a. member add should fail, saying it cannot generate a valid ETCD_INITIAL_CLUSTER due to duplicate names, or b. etcd should accept duplicate names in ETCD_INITIAL_CLUSTER and treat them as separate members. This can be accomplished by updating func NewClusterFromURLsMap as follows:

	c := NewCluster(lg, opts...)
	for name, urls := range urlsmap {
		for idx, _ := range urls {
			m := NewMember(name, urls[idx:idx+1], token, nil)
			[...]

I don’t know if there’s a real need to be able to specify multiple URLs for a single member.

How can we reproduce it (as minimally and precisely as possible)?

You need three terminals, x, y, and z:

x$ mkdir -p test_case/{a,b,c}/{data/member,wal}
x$ ETCD_INITIAL_CLUSTER="a=http://127.0.0.1:40000,b=http://127.0.0.1:40001" ETCD_INITIAL_CLUSTER_STATE=new etcd --name a --{initial-advertise,listen}-peer-urls=http://127.0.0.1:40000 --{advertise,listen}-client-urls=http://127.0.0.1:50000 --data-dir test_case/a/data --wal-dir test_case/a/wal
y$ ETCD_INITIAL_CLUSTER="a=http://127.0.0.1:40000,b=http://127.0.0.1:40001" ETCD_INITIAL_CLUSTER_STATE=new etcd --name b --{initial-advertise,listen}-peer-urls=http://127.0.0.1:40001 --{advertise,listen}-client-urls=http://127.0.0.1:50001 --data-dir test_case/b/data --wal-dir test_case/b/wal
[now kill both servers with Ctrl-C]
x$ etcd --listen-peer-urls=http://127.0.0.1:40000 --{advertise,listen}-client-urls=http://127.0.0.1:50000 --data-dir test_case/a/data --wal-dir test_case/a/wal
y$ etcd --listen-peer-urls=http://127.0.0.1:40001 --{advertise,listen}-client-urls=http://127.0.0.1:50001 --data-dir test_case/b/data --wal-dir test_case/b/wal
z$ ETCDCTL_ENDPOINT=http://localhost:50000 etcdctl member add c http://127.0.0.1:40002
Added member named c with ID 7b4d6e3edb76bc59 to cluster

ETCD_NAME="c"
ETCD_INITIAL_CLUSTER="default=http://127.0.0.1:40000,c=http://127.0.0.1:40002,default=http://127.0.0.1:40001"
ETCD_INITIAL_CLUSTER_STATE="existing"
z$ export ETCD_NAME="c"
z$ export ETCD_INITIAL_CLUSTER="default=http://127.0.0.1:40000,c=http://127.0.0.1:40002,default=http://127.0.0.1:40001"
z$ export ETCD_INITIAL_CLUSTER_STATE="existing"
z$ etcd --listen-peer-urls=http://127.0.0.1:40002 --{advertise,listen}-client-urls=http://127.0.0.1:50002 --data-dir test_case/c/data --wal-dir test_case/c/wal
[...]
member count is unequal

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
# paste output here

$ etcdctl version
# paste output here

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

etcd 3.5.2

Relevant log output

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (9 by maintainers)

Most upvoted comments

@Divya063 Definitely yes. Thank you!

@ahrtr Would it be okay if I work on this?