etcd: Output Inconsistency with json writer

Etcd version:

ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --endpoints=[https://127.0.0.1:2379] version
etcdctl version: 3.2.18
API version: 3.2

Output without specifying a writer. Take note of the member IDs.

ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --endpoints=[https://127.0.0.1:2379] member list
3214dd175f565349, started, i-0891498e3531cb084, https://10.97.0.110:2380, https://10.97.0.110:2379
65bf2e476daa66a4, started, i-02ce67ad669ebcab9, https://10.97.3.79:2380, https://10.97.3.79:2379
750e1bfd8a0a0b33, started, i-0bbecc2ae92d24ce3, https://10.97.5.118:2380, https://10.97.5.118:2379

Output using the JSON writer. Notice the member IDs are different from the output in the non-json output:

ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --endpoints=[https://127.0.0.1:2379] member list -w json
{"header":{"cluster_id":11891086386631399677,"member_id":3608752293884089161,"raft_term":19},"members":[{"ID":3608752293884089161,"name":"i-0891498e3531cb084","peerURLs":["https://10.97.0.110:2380"],"clientURLs":["https://10.97.0.110:2379"]},{"ID":7331629602699896484,"name":"i-02ce67ad669ebcab9","peerURLs":["https://10.97.3.79:2380"],"clientURLs":["https://10.97.3.79:2379"]},{"ID":8434709927868107571,"name":"i-0bbecc2ae92d24ce3","peerURLs":["https://10.97.5.118:2380"],"clientURLs":["https://10.97.5.118:2379"]}]}

I’m not sure if they are encoded in a way that isn’t documented or a serialization issue. FWIW this appears to be producible with v3.3

This produces an error when removing a member directly using the ID obtained from the json output since etcd does not recognize it.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 9
  • Comments: 21 (7 by maintainers)

Most upvoted comments

It’s a very annoying thing. As other mentioned parsing a JSON is safer than expecting that the simple output structure (fields or ordering or printing out a few extra lines in the beginning) won’t change in the future. Also with jq you can do other tricks which can make your life easier, like selecting elements from the output, in example:

etcdctl member list --write-out=json | jq ".members[] | select( .name == \"$AWS_INSTANCE_ID\" ) | .ID"

It’s not intuitive that this will not give back the member ID for the actual instance. Not even in a decimal form (that’s why the supposed workaround won’t work, at least in my case). Unfortunately on CoreOS only jq exists (no Perl, Python, Ruby or any other scripting language) which can’t handle the numbers correctly.

The other issue is that I wanted to wait until a new instance catches up to the cluster. For this I need to know who is the leader and what it’s actual RaftIndex is. The leader is represented as a number in the endpoint status --cluster, so I got back false value again. And as the RaftIndex is a number too then I’ll get back wrong values there too if the cluster is old enough (I didn’t tested it though and I admit it’s a corner case at the moment).

It would be great if etcdctl has an option like --string-numbers or similar which converts all the numbers to strings in the output. The default can and should be false to maintain backward compatibility. Until that - I think - it’s not safe to depend on the JSON output in shell scripts (as you probably use jq for parsing).

Different format showing different member ID is REALLY SO confusing. Backward compatibility? for what? And why was it designed this way in the first place? And it seems like such a simple issue was closed without a fix… (Nobody was willing to fix it. Disappointing.)

Anyway, fortunately, for those who are suffering this issue, there is a workaround, with version 3.5. I just tested, with option --hex=true, the json output turns to hex string instead of decimal integer.

@gyuho and @spzala mentioned to update the doc, think of it as a fix. I wonder where is the updated (fixed) doc? At least, I didn’t find any words about this issue in https://github.com/etcd-io/etcd/blob/main/etcdctl/README.md.

@akunszt Alright a workaround that DOES work is to manually convert the IDs to strings with sed. Something like:

MEMBERS=$(etcdctl member list --write-out=json' | sed 's/\("ID":\)\([0-9]\+\)\(,\)/\1"\2"\3/g')
~ # echo $MEMBERS
... "members":[{"ID":"6374980892269173642","name"mynode",...
~ # jq -c '.' <<< $MEMBERS
... "members":[{"ID":"6374980892269173642","name"mynode",...

Using printf works in this case:

~ # etcdctl member list
...
808d618cff3d18a6, started, mynode, ...
~ # jq ".members[] | select( .name == \"mynode\" ) | .ID" <<< $MEMBERS | xargs printf '%x'
808d618cff3d18a6

@mattayes Unfortunately no, that won’t help as the jq broke the actual value. Let me show you:

~ # etcdctl member list --write-out=json | jq ".members[] | select( .name == \"i-0bcaafbdff9bef2f6\" ) | .ID" | xargs printf '%x\n'
268c281fa4cb42f8
~ # etcdctl member list | grep i-0bcaafbdff9bef2f6
268c281fa4cb41a3, started, i-0bcaafbdff9bef2f6, http://ip-1-2-3-4.region.compute.internal:2380, http://ip-1-2-3-4.region.compute.internal:2379

The issue is that the ID is an integer and jq converts it to a floating-point number and then back which changes the actual value.

~ # etcdctl member list --write-out=json
... "members":[{"ID":2777639186554634659,"name":"i-0bcaafbdff9bef2f6",...
~ # etcdctl member list --write-out=json | jq -c .
... "members":[{"ID":2777639186554635000,"name":"i-0bcaafbdff9bef2f6",...

If the ID would be string, even with the same content that would workaround the jq’s rounding error. We actually don’t need it to be an integer, we don’t want to do arithmetic operations on it, just handle it as a string.

As a workaround, decimal output of id can be used when sending request to json grpc gateway: https://github.com/etcd-io/etcd/blob/master/Documentation/dev-guide/api_grpc_gateway.md

I stumbled into this today while trying to automate removing a member from the cluster. Right now, you can’t take the IDs from etcd member list -w json and use it straight for etcdctl member remove because of the uint64 <> hex mismatch. Can we:

  • output the ID as a hex string in - w json (or at least give an option)
  • Improve the documentation regarding this behaviour ?

I would be happy to work on a PR but would like some clarification regarding the direction first.