go-libp2p: bug(transport): `sendmsg: invalid argument` when using QUIC on latest version

Since upgrading to v0.31.0, we encountered a new error: sendmsg: invalid argument.

I am not entirely sure yet how the entire network got into this state (15 nodes), but all nodes were logging similarly and it persisted until the network degraded into a state where the protocol was no longer being used. After restarting the nodes, the network recovered and has been stable since. My theory is that it happened when a single node got overloaded that was acting as the original source of the information to be shared throughout the network. I will gladly try to recreate the scenario with whatever debugging flags are needed to gather more information.

I saw this issue in quic-go that seems to be resolved, but may be related: https://github.com/quic-go/quic-go/issues/3911. In our network both the clients and servers arch is x86_64/

Clients started logging:

failed to read status from stream: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.13.187:2121: sendmsg: invalid argument
failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument
failed to read status from stream: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.13.187:2121: sendmsg: invalid argument
failed to write request to stream: received a stateless reset with token 03cb6bd662b7f8be9172d39336b039f3
failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument
failed to open stream: connection failed
failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument
failed to read eds from ods bytes: share: reading car file: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument
failed to read status from stream: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.12.233:2121: sendmsg: invalid argument
failed to read eds from ods bytes: share: reading next car entry: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument
failed to read status from stream: INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.0.86:2121: sendmsg: invalid argument
failed to read status from stream: received a stateless reset with token de28b1cd1aae453820fab0efe4de9963

Servers of this protocol logged the same error along with a stateless reset for concurrent streams:

2023-10-02T10:41:14.672Z	WARN	shrex/eds	shrexeds/server.go:140	server: writing ods to stream	{"peer": "12D3KooWDr92PrFG1kKRJJn2VPK7uZB9TAgBv76oB3gti1sw3yEp", "hash": "360F454ACFD00261988829E4504F751AE7C8C8C282E49AF34219AF740176A488", "err": "writing ODS bytes: 1016877 bytes written, INTERNAL_ERROR (local): write udp4 0.0.0.0:2121->100.64.13.187:2121: sendmsg: invalid argument"}
2023-10-02T10:41:15.667Z	WARN	shrex/eds	shrexeds/server.go:140	server: writing ods to stream	{"peer": "12D3KooWDr92PrFG1kKRJJn2VPK7uZB9TAgBv76oB3gti1sw3yEp", "hash": "360F454ACFD00261988829E4504F751AE7C8C8C282E49AF34219AF740176A488", "err": "writing ODS bytes: 1206038 bytes written, received a stateless reset with token 2f9fb982cbcf677f84a5a04284723fbd"}
2023-10-02T10:41:16.667Z	WARN	shrex/eds	shrexeds/server.go:140	server: writing ods to stream	{"peer": "12D3KooWDr92PrFG1kKRJJn2VPK7uZB9TAgBv76oB3gti1sw3yEp", "hash": "360F454ACFD00261988829E4504F751AE7C8C8C282E49AF34219AF740176A488", "err": "writing ODS bytes: 1190481 bytes written, received a stateless reset with token 191b2b55a484ccffaebfb6bf29879cd5"}
2023-10-02T10:41:18.667Z	WARN	shrex/eds	shrexeds/server.go:140	server: writing ods to stream	{"peer": "12D3KooWDr92PrFG1kKRJJn2VPK7uZB9TAgBv76oB3gti1sw3yEp", "hash": "360F454ACFD00261988829E4504F751AE7C8C8C282E49AF34219AF740176A488", "err": "writing ODS bytes: 972566 bytes written, received a stateless reset with token 66b89f515f584e1762d8836dd2ba3d5e"}
2023-10-02T10:41:20.678Z	WARN	shrex/eds	shrexeds/server.go:140	server: writing ods to stream	{"peer": "12D3KooWDr92PrFG1kKRJJn2VPK7uZB9TAgBv76oB3gti1sw3yEp", "hash": "360F454ACFD00261988829E4504F751AE7C8C8C282E49AF34219AF740176A488", "err": "writing ODS bytes: 39633 bytes written, received a stateless reset with token c72aa0e326e0dd06d99628199d5d41a3"}

go.mod: https://github.com/celestiaorg/celestia-node/blob/main/go.mod

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Reactions: 1
  • Comments: 22 (12 by maintainers)

Commits related to this issue

Most upvoted comments

Yes, I will be trying to recreate the scenario today. Once I can get the logs consistently again, I will experiment with the flags and post the results here