etcd: Output Inconsistency with json writer
Etcd version:
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --endpoints=[https://127.0.0.1:2379] version
etcdctl version: 3.2.18
API version: 3.2
Output without specifying a writer. Take note of the member IDs.
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --endpoints=[https://127.0.0.1:2379] member list
3214dd175f565349, started, i-0891498e3531cb084, https://10.97.0.110:2380, https://10.97.0.110:2379
65bf2e476daa66a4, started, i-02ce67ad669ebcab9, https://10.97.3.79:2380, https://10.97.3.79:2379
750e1bfd8a0a0b33, started, i-0bbecc2ae92d24ce3, https://10.97.5.118:2380, https://10.97.5.118:2379
Output using the JSON writer. Notice the member IDs are different from the output in the non-json output:
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --endpoints=[https://127.0.0.1:2379] member list -w json
{"header":{"cluster_id":11891086386631399677,"member_id":3608752293884089161,"raft_term":19},"members":[{"ID":3608752293884089161,"name":"i-0891498e3531cb084","peerURLs":["https://10.97.0.110:2380"],"clientURLs":["https://10.97.0.110:2379"]},{"ID":7331629602699896484,"name":"i-02ce67ad669ebcab9","peerURLs":["https://10.97.3.79:2380"],"clientURLs":["https://10.97.3.79:2379"]},{"ID":8434709927868107571,"name":"i-0bbecc2ae92d24ce3","peerURLs":["https://10.97.5.118:2380"],"clientURLs":["https://10.97.5.118:2379"]}]}
I’m not sure if they are encoded in a way that isn’t documented or a serialization issue. FWIW this appears to be producible with v3.3
This produces an error when removing a member directly using the ID obtained from the json output since etcd does not recognize it.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 9
- Comments: 21 (7 by maintainers)
It’s a very annoying thing. As other mentioned parsing a JSON is safer than expecting that the simple output structure (fields or ordering or printing out a few extra lines in the beginning) won’t change in the future. Also with
jq
you can do other tricks which can make your life easier, like selecting elements from the output, in example:etcdctl member list --write-out=json | jq ".members[] | select( .name == \"$AWS_INSTANCE_ID\" ) | .ID"
It’s not intuitive that this will not give back the member ID for the actual instance. Not even in a decimal form (that’s why the supposed workaround won’t work, at least in my case). Unfortunately on CoreOS only
jq
exists (no Perl, Python, Ruby or any other scripting language) which can’t handle the numbers correctly.The other issue is that I wanted to wait until a new instance catches up to the cluster. For this I need to know who is the leader and what it’s actual RaftIndex is. The leader is represented as a number in the
endpoint status --cluster
, so I got back false value again. And as the RaftIndex is a number too then I’ll get back wrong values there too if the cluster is old enough (I didn’t tested it though and I admit it’s a corner case at the moment).It would be great if
etcdctl
has an option like--string-numbers
or similar which converts all the numbers to strings in the output. The default can and should be false to maintain backward compatibility. Until that - I think - it’s not safe to depend on the JSON output in shell scripts (as you probably usejq
for parsing).Different format showing different member ID is REALLY SO confusing. Backward compatibility? for what? And why was it designed this way in the first place? And it seems like such a simple issue was closed without a fix… (Nobody was willing to fix it. Disappointing.)
Anyway, fortunately, for those who are suffering this issue, there is a workaround, with version 3.5. I just tested, with option
--hex=true
, the json output turns to hex string instead of decimal integer.@gyuho and @spzala mentioned to update the doc, think of it as a fix. I wonder where is the updated (fixed) doc? At least, I didn’t find any words about this issue in https://github.com/etcd-io/etcd/blob/main/etcdctl/README.md.
@akunszt Alright a workaround that DOES work is to manually convert the IDs to strings with
sed
. Something like:Using
printf
works in this case:@mattayes Unfortunately no, that won’t help as the
jq
broke the actual value. Let me show you:The issue is that the ID is an integer and
jq
converts it to a floating-point number and then back which changes the actual value.If the ID would be string, even with the same content that would workaround the
jq
’s rounding error. We actually don’t need it to be an integer, we don’t want to do arithmetic operations on it, just handle it as a string.As a workaround, decimal output of id can be used when sending request to json grpc gateway: https://github.com/etcd-io/etcd/blob/master/Documentation/dev-guide/api_grpc_gateway.md
I stumbled into this today while trying to automate removing a member from the cluster. Right now, you can’t take the IDs from
etcd member list -w json
and use it straight foretcdctl member remove
because of theuint64
<>hex
mismatch. Can we:- w json
(or at least give an option)I would be happy to work on a PR but would like some clarification regarding the direction first.