rust-libp2p: Failed to write message to substream when using QUIC with Rendevous
Summary
I’m trying to write up a P2P network using rendevous for peer discovery, however whenever I try to register a peer with my rendevous server, I get the following warning
2022-12-24T04:15:27.496989Z WARN Connection with peer 12D3KooWGVBwNbD2pmxUTCbw1h5GX1XGBkLwkmX94EsdNe8FzFwr failed: Failed to write message to substream
at /Users/firaenix/.cargo/registry/src/github.com-1ecc6299db9ec823/libp2p-rendezvous-0.11.0/src/client.rs:203 on ThreadId(1)
I get no error/warning messages on the rendevous server, only the client. I can receive a response from a discover request though, but ofcourse, the array of registrations is always empty.
Expected behaviour
I would expect to be able to register my peer and discover that peer on another instance of my application.
Version
Libp2p 0.5
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (8 by maintainers)
Sorry for the late reply. The issue here is that closing the substream between server and client is racy. The following sequence happens:
When the server finishes writing and closing the stream, it dropps that stream. If this happens before it receives the
FINof the client, it sends aSTOP_SENDINGto signal the client that it won’t read any more data. This then causes the above error in the client’sclosecall.We can’t simply do the above patch since there may be situations where we still have unacknowledged write data. If that’s the case we won’t know if the remote read all of our data, thus we need to return the
FinishError::Stoppedto the user.Now for the rendezvous example we can simply fix this by having the Client call
closedirectly after thewritecall, before it starts reading. That would guarantee that the server receives the FIN. That being said, the calling orderwrite->read->closegenerally should be possible and not cause racy bugs. I am currently looking into how to handle this case better. I hope that helps. If you right now need a fix for rendezvous you could do a PR with the patch I described above, but I hope I’ll get a proper fix done this week. I am also looking into whether this is related to the bugs described in #3298 and #3308.Last time I checked, the rendezvous examples were functional. Best to compare your code to those and maybe try incorporating QUIC there first. I’d be surprised if there is an interaction between QUIC and rendezvous though.