celestia-node: share/p2p/shrexeds: Error handling

A lot of new errors are appearing client side against BSR.

Application Error times out after 60s

2023-05-09T15:30:49.015+0200	DEBUG	share/getters	getters/shrex.go:171	eds: request failed	{"hash": "FB78B3FEC22B77F98C4A23BB27EB557E202172AD74A018800CB9A09299A99EC8", "peer": "12D3KooWBc9nKfmKJZKqkto9UHvQce1ZZZy3MoNdZzDQpgi2SFkD", "attempt": 1, "err": "failed to read status from stream: Application error 0x0 (remote)", "finished (s)": 57.119983625}

Failed to read status from stream: deadline exceeded after 60s

Why does it happen at this point? If ratelimited, the err should be io.EOF. The deadline should be smaller here

2023-05-09T16:14:34.920+0200	DEBUG	shrex/eds	shrexeds/client.go:57	client: eds request to peer failed	{"peer": "12D3KooWFC3v38kFFJYWVNjR92iX962YpSVpAbGsDwYcgVqvTWk8", "hash": "9EB61119363E061DF7525B5DDEC77D13D63B36C4D479DBF29C4A2E3926148165", "error": "failed to read status from stream: deadline exceeded"}
2023-05-09T16:14:34.920+0200	DEBUG	share/getters	getters/shrex.go:172	eds: request failed	{"hash": "9EB61119363E061DF7525B5DDEC77D13D63B36C4D479DBF29C4A2E3926148165", "peer": "12D3KooWFC3v38kFFJYWVNjR92iX962YpSVpAbGsDwYcgVqvTWk8", "attempt": 1, "err": "context deadline exceeded", "finished (s)": 60.000769791}

Somehow we get a peer that doesnt support the protocol (almost immediate error)

How is this possible with our current nodes?
We should probably just blacklist

2023-05-09T15:24:15.781+0200	DEBUG	shrex/eds	shrexeds/client.go:57	client: eds request to peer failed	{“peer”: “12D3KooWCV61dtrSkykyEJ4Z8i6LTgvYhaWVuEjChs8Crk4phLXJ”, “hash”: “2C159F3964034F0DE630D9309735AD80E267E012C7BA271A6D93B17E610A444E”, “error”: “failed to open stream: protocols not supported: [/blockspacerace-0/shrex/eds/v0.0.1]“}
2023-05-09T15:24:15.781+0200	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{“peer”: “12D3KooWCV61dtrSkykyEJ4Z8i6LTgvYhaWVuEjChs8Crk4phLXJ”, “hash”: “2C159F3964034F0DE630D9309735AD80E267E012C7BA271A6D93B17E610A444E”, “err”: “failed to open stream: protocols not supported: [/blockspacerace-0/shrex/eds/v0.0.1]“}
2023-05-09T15:24:15.781+0200	DEBUG	share/getters	getters/shrex.go:171	eds: request failed	{“hash”: “2C159F3964034F0DE630D9309735AD80E267E012C7BA271A6D93B17E610A444E”, “peer”: “12D3KooWCV61dtrSkykyEJ4Z8i6LTgvYhaWVuEjChs8Crk4phLXJ”, “attempt”: 3, “err”: “failed to open stream: protocols not supported: [/blockspacerace-0/shrex/eds/v0.0.1]“, “finished (s)“: 0.019803125}

Stateless reset with token after 22s

Has something to do w QUIC

2023-05-09T15:02:30.163+0200	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{“peer”: “12D3KooWD4uxio9FnK2AUBoydXo5LqiiXbC7QZR658wUt6Hf5xfD”, “hash”: “D7A4B103159BCE6FC9AD0B3E8995F1932B2F9C76357BFD1E886298119E4D12E1", “err”: “failed to read status from stream: received a stateless reset with token a7b84d1da38f64eec20c9e2b8b395729"}
2023-05-09T15:02:30.163+0200	DEBUG	share/getters	getters/shrex.go:170	eds: request failed	{“hash”: “D7A4B103159BCE6FC9AD0B3E8995F1932B2F9C76357BFD1E886298119E4D12E1", “peer”: “12D3KooWD4uxio9FnK2AUBoydXo5LqiiXbC7QZR658wUt6Hf5xfD”, “attempt”: 1, “err”: “failed to read status from stream: received a stateless reset with token a7b84d1da38f64eec20c9e2b8b395729”, “finished (s)“: 22.121177}

Failed to open stream: context deadline exceeded (after 60s)

We should not let the opening of the stream take more than maybe 5 seconds.

2023-05-09T14:32:34.940+0200	DEBUG	shrex/eds	shrexeds/client.go:57	client: eds request to peer failed	{“peer”: “12D3KooWBbc9ixMAH4SzF3EH6SwSfo8gRW1bVrDUX19oxC4DaRbz”, “hash”: “C71DA660939D4B174129A6F5839DB73B9F353544365C5EEADF2C7F7324F4C38A”, “error”: “failed to open stream: context deadline exceeded”}
2023-05-09T14:32:34.940+0200	DEBUG	share/getters	getters/shrex.go:170	eds: request failed	{“hash”: “C71DA660939D4B174129A6F5839DB73B9F353544365C5EEADF2C7F7324F4C38A”, “peer”: “12D3KooWBbc9ixMAH4SzF3EH6SwSfo8gRW1bVrDUX19oxC4DaRbz”, “attempt”: 1, “err”: “context deadline exceeded”, “finished (s)“: 60.000322042}

~~7. RequestEDS checks child context for ctx error, then returns parent context error, leading to returning of nil, nil and causing panic~~ Resolved

“failed to read eds from ods bytes: share: reading next car entry: stream reset”

2023-05-13T17:15:36.413+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "80730570DB5B2F39DCC2BE8091A3322E3B0B404ABAA2EC22F570E9615A60161C", "err": "failed to read eds from ods bytes: share: reading next car entry: stream reset"}

Not an actual error, but server node could always return ErrNotFound and not get any punishment

No recent network activity.

Happens for multiple peers at the same time, so probably related to client connectivity issues. Needs to be handled accordingly (no remote peer punishment)

2023-05-13T20:03:30.190+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBASepmTQtATM196L1wWH6kVd4TBgpUN57koyfjT2vheX", "hash": "91DFD05708F4D96E2C1B13B040C07C962830D970922E8192423DF4F579B80EFC", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "91DFD05708F4D96E2C1B13B040C07C962830D970922E8192423DF4F579B80EFC", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "3F2C7DE3EB02FE219BF7CE931030E6E79C0E687000187B5FF456DEE2D9FB010D", "err": "failed to read eds from ods bytes: share: reading next car entry: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "A5723C5C033FB1F6A54BE906639C901FBC3C01278F5EDD8C26CB651BB6D96997", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "76044121A20F6AB9FAE153F52CE671C4363385478FC1A12BF8717113F1A96F6A", "err": "failed to read status from stream: timeout: no recent network activity"}

Failed to open stream: failed to dial

Happens right after 10, so must be related to connectivity issue on client side.
Needs to be handled too

2023-05-13T21:03:59.627+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "83676B211F75ECC843F411D3E2A57B8BD368DD1B8786CB8A73AFDFB08FA1E45B", "err": "failed to open stream: failed to dial 12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng:\n  * [/ip4/68.183.28.164/tcp/2121] dial tcp4 0.0.0.0:2121->68.183.28.164:2121: i/o timeout\n  * [/ip4/68.183.28.164/udp/2121/quic-v1] timeout: no recent network activity"}

Stream reset on reading response status.

The only reason for stream reset at this stage is if request was invalid. But since running client works fine with majority of peers, it seems to be server issue.

2023-05-13T21:03:02.747+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBTFMz9doATz6gxb9VUZ5d2iJ1udBfJCkho6ixgeu6nhE", "hash": "4EA6E0541DAEF1C7206EC5B5CA9C00F6546F6E32E12D6711F6B4C60E69BD3C8D", "err": "failed to read status from stream: stream reset"}

Routing not found

It is extremely unlikely that peer has disconnected at the point of shrexeds request. The fact that there are 3 peers disconnected in short sequence requires investigation.

2023-05-13T21:03:08.633+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "9F78CE8A2E753B78A4380C530F4AA8B6F8125CC38F38EE9BDDAD8932E083672C", "err": "failed to open stream: failed to find peers: routing: not found"}
2023-05-13T21:03:20.434+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "4657B89D6D7618A6928B1612A26DB9D03BA3D6182A4388D4A0165DDD759C0FB1", "err": "failed to open stream: failed to find peers: routing: not found"}
2023-05-13T21:03:42.354+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBCfAdTYhtpuzbKqnc5WM3qh7Zkp3DmjrbAwgTsgiKQPW", "hash": "B366AFE879E7979DF8A132A3D3E5833F48FCF7F9434F6C4E4AE81CC0B1F7277C", "err": "failed to open stream: failed to find peers: routing: not found"}

About this issue

Original URL
State: closed
Created a year ago
Reactions: 1
Comments: 15 (14 by maintainers)

Commits related to this issue

fix(share/p2p/shrexeds): return correct err on context error (#2177) Resolves item 7 on https://github.com/celestiaorg/celestia-node/issues/2176 — committed to celestiaorg/celestia-node by derrandz a year ago

Most upvoted comments

To summarize. Most of the errors above are unrelated to shrex. Similarly, shrex should not be responsible for handling them. They are various manifestations of issues on the remote side, which shrex can do nothing about. It should just ignore this peer, let it go and move on.

However, I really appreciate the effort here to document each.

Wondertan on May 15, 2023

Stateless reset with token after 22s

Happens for multiple peers at the same time. Probably related to closing connections over the limit on libp2p level.

2023-05-14T00:34:49.049+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWJkrQ6tRXik4gyN6A6BsKd6UsMhMDH2KLXnj6sGXHJJn2", "hash": "EBD6143BDA97E7AB966E959D95CFCBE234AF7B48CF65BDFDEC2423EE90D4192F", "err": "failed to read status from stream: received a stateless reset with token bd133665910f96aa331ea6e530ec4c30"}
2023-05-14T00:34:49.262+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWEkFUj5rrWoeQL9stJR6AkC2F1MBT4MUeiqaoi8ZUAADA", "hash": "EBD6143BDA97E7AB966E959D95CFCBE234AF7B48CF65BDFDEC2423EE90D4192F", "err": "failed to read status from stream: received a stateless reset with token a2ff5632a83e43d3ca13a3499dcd9704"}
2023-05-14T00:34:49.499+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWDWJUWz3zRfB2bfXJxK7WPSM9QsDDPeME1BYJqq2h9oTh", "hash": "EBD6143BDA97E7AB966E959D95CFCBE234AF7B48CF65BDFDEC2423EE90D4192F", "err": "failed to read status from stream: received a stateless reset with token eda08205ebeb0307b4104d76523c8477"}
2023-05-14T00:34:49.558+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWHB32cdsPLemcuJdjyLUJdWqp1R4GJK6s2ueqJvfxobdg", "hash": "EBD6143BDA97E7AB966E959D95CFCBE234AF7B48CF65BDFDEC2423EE90D4192F", "err": "failed to read status from stream: received a stateless reset with token b2f223721890f860f6fcd2677df4d9ad"}

walldiss on May 15, 2023