rust-libp2p: Failed to write message to substream when using QUIC with Rendevous

Summary

I’m trying to write up a P2P network using rendevous for peer discovery, however whenever I try to register a peer with my rendevous server, I get the following warning

2022-12-24T04:15:27.496989Z  WARN  Connection with peer 12D3KooWGVBwNbD2pmxUTCbw1h5GX1XGBkLwkmX94EsdNe8FzFwr failed: Failed to write message to substream
    at /Users/firaenix/.cargo/registry/src/github.com-1ecc6299db9ec823/libp2p-rendezvous-0.11.0/src/client.rs:203 on ThreadId(1)

I get no error/warning messages on the rendevous server, only the client. I can receive a response from a discover request though, but ofcourse, the array of registrations is always empty.

Expected behaviour

I would expect to be able to register my peer and discover that peer on another instance of my application.

Version

Libp2p 0.5

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (8 by maintainers)

Most upvoted comments

Sorry for the late reply. The issue here is that closing the substream between server and client is racy. The following sequence happens:

sequenceDiagram
    participant Client
    participant Server
Note left of Client: write
Client ->>Server: <data>
Note right of Server: read
Note right of Server: write
Server->>Client:<data>
Note right of Server: close
Server ->>Client: FIN
Note left of Client: read
Client ->>Server: ack
Note left of Client : close
Note right of Server : drop
Client ->>Server: FIN

When the server finishes writing and closing the stream, it dropps that stream. If this happens before it receives the FIN of the client, it sends a STOP_SENDING to signal the client that it won’t read any more data. This then causes the above error in the client’s close call.

We can’t simply do the above patch since there may be situations where we still have unacknowledged write data. If that’s the case we won’t know if the remote read all of our data, thus we need to return the FinishError::Stopped to the user.

Now for the rendezvous example we can simply fix this by having the Client call close directly after the write call, before it starts reading. That would guarantee that the server receives the FIN. That being said, the calling order write-> read-> close generally should be possible and not cause racy bugs. I am currently looking into how to handle this case better. I hope that helps. If you right now need a fix for rendezvous you could do a PR with the patch I described above, but I hope I’ll get a proper fix done this week. I am also looking into whether this is related to the bugs described in #3298 and #3308.

Last time I checked, the rendezvous examples were functional. Best to compare your code to those and maybe try incorporating QUIC there first. I’d be surprised if there is an interaction between QUIC and rendezvous though.