fedimint: Weird communication errors while setting up federation
If I setup a federation using two or more machines in the same datacenter, everything works fine. But if one of the machines is a home computer on another continent, then weird stuff starts to happen. In this example, DKG actually failed.
Note 1: all ports are open and properly forwarded. Communication seems to be okay in both directions. Home internet is 1Gbit/s fiber, so no bandwidth issues besides international link constraints
Note 2: logs abbreviate for repeated or sequential messages by replacing them with (...)
home computer logs:
fedimintd-mutinynet-fedimintd-1 | 2023-07-20T19:02:10.714149Z INFO connection{remote_addr=<redacted>:47346 conn_id=17}: net::peer::dkg: Peer 0 running distributed key generation...
fedimintd-mutinynet-fedimintd-1 | 2023-07-20T19:02:12.543892Z INFO run_io_thread: net::peer: Replacing existing connection
(...)
fedimintd-mutinynet-fedimintd-1 | 2023-07-20T19:02:21.222195Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd-mutinynet-fedimintd-1 | 2023-07-20T19:02:21.239828Z INFO run_io_thread: net::peer: Received old message expected=MessageId(90) received=MessageId(1)
(...)
fedimintd-mutinynet-fedimintd-1 | 2023-07-20T19:02:21.245097Z INFO run_io_thread: net::peer: Received old message expected=MessageId(90) received=MessageId(89)
fedimintd-mutinynet-fedimintd-1 | 2023-07-20T19:02:22.010248Z INFO run_io_thread: net::peer: Replacing existing connection
(...)
fedimintd-mutinynet-fedimintd-1 | 2023-07-20T19:02:26.160320Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd-mutinynet-fedimintd-1 | 2023-07-20T19:02:27.275235Z INFO run_io_thread: net::peer: Received old message expected=MessageId(530) received=MessageId(136)
fedimintd-mutinynet-fedimintd-1 | 2023-07-20T19:02:27.275263Z INFO run_io_thread: net::peer: Replacing existing connection
(...)
fedimintd-mutinynet-fedimintd-1 | 2023-07-20T19:02:58.968762Z INFO run_io_thread: net::peer: Replacing existing connection
Remote datacenter computer:
fedimintd_1 | 2023-07-20T19:02:11.899547Z INFO connection{remote_addr=<redacted>:34162 conn_id=2}: net::peer::dkg: Peer 1 running distributed key generation...
fedimintd_1 | 2023-07-20T19:02:12.643151Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:13.667591Z INFO connection{remote_addr=<redacted>:53844 conn_id=14}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:13.676290Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:14.065690Z ERROR connection{remote_addr=<redacted>:53844 conn_id=14}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 14
fedimintd_1 | 2023-07-20T19:02:14.293186Z INFO run_io_thread: net::peer: Replacing existing connection
(...)
fedimintd_1 | 2023-07-20T19:02:16.813292Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:17.242244Z INFO connection{remote_addr=<redacted>:53854 conn_id=15}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:17.641922Z ERROR connection{remote_addr=<redacted>:53854 conn_id=15}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 15
fedimintd_1 | 2023-07-20T19:02:18.241657Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:18.251738Z INFO run_io_thread: net::peer: Received old message expected=MessageId(113) received=MessageId(1)
(...)
fedimintd_1 | 2023-07-20T19:02:18.267630Z INFO run_io_thread: net::peer: Received old message expected=MessageId(113) received=MessageId(92)
fedimintd_1 | 2023-07-20T19:02:18.869571Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:19.693103Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:20.072910Z INFO connection{remote_addr=<redacted>:53858 conn_id=16}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:20.483714Z ERROR connection{remote_addr=<redacted>:53858 conn_id=16}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 16
fedimintd_1 | 2023-07-20T19:02:20.501468Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:20.515497Z INFO run_io_thread: net::peer: Received old message expected=MessageId(186) received=MessageId(113)
(...)
fedimintd_1 | 2023-07-20T19:02:20.557392Z INFO run_io_thread: net::peer: Received old message expected=MessageId(186) received=MessageId(158)
fedimintd_1 | 2023-07-20T19:02:21.289447Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:21.940945Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:23.274142Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:23.467870Z INFO connection{remote_addr=<redacted>:47002 conn_id=17}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:23.866845Z ERROR connection{remote_addr=<redacted>:47002 conn_id=17}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 17
fedimintd_1 | 2023-07-20T19:02:24.490124Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:24.692470Z INFO run_io_thread: net::peer: Received old message expected=MessageId(238) received=MessageId(186)
(...)
fedimintd_1 | 2023-07-20T19:02:24.730179Z INFO run_io_thread: net::peer: Received old message expected=MessageId(238) received=MessageId(237)
fedimintd_1 | 2023-07-20T19:02:26.302726Z INFO connection{remote_addr=<redacted>:47018 conn_id=18}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:26.312642Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:26.723068Z ERROR connection{remote_addr=<redacted>:47018 conn_id=18}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 18
fedimintd_1 | 2023-07-20T19:02:27.164129Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:28.351101Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:28.362582Z INFO run_io_thread: net::peer: Received old message expected=MessageId(561) received=MessageId(368)
(...)
fedimintd_1 | 2023-07-20T19:02:28.474514Z INFO run_io_thread: net::peer: Received old message expected=MessageId(561) received=MessageId(560)
fedimintd_1 | 2023-07-20T19:02:29.062812Z INFO connection{remote_addr=<redacted>:34162 conn_id=2}: net::peer::dkg: Sending confirmations to other peers.
fedimintd_1 | 2023-07-20T19:02:29.063067Z INFO connection{remote_addr=<redacted>:34162 conn_id=2}: net::peer::dkg: Waiting for confirmations from other peers.
fedimintd_1 | 2023-07-20T19:02:29.723137Z INFO connection{remote_addr=<redacted>:47034 conn_id=19}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:30.121731Z ERROR connection{remote_addr=<redacted>:47034 conn_id=19}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 19
fedimintd_1 | 2023-07-20T19:02:30.151219Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:31.130850Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:31.998347Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:32.553553Z INFO connection{remote_addr=<redacted>:46852 conn_id=20}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:32.949790Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:33.059406Z ERROR connection{remote_addr=<redacted>:46852 conn_id=20}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 20
fedimintd_1 | 2023-07-20T19:02:34.184564Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:35.255629Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:35.461529Z INFO connection{remote_addr=<redacted>:46868 conn_id=21}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:36.038083Z ERROR connection{remote_addr=<redacted>:46868 conn_id=21}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 21
fedimintd_1 | 2023-07-20T19:02:36.479843Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:37.448694Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:38.830992Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:39.045601Z INFO connection{remote_addr=<redacted>:46878 conn_id=22}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:39.587477Z ERROR connection{remote_addr=<redacted>:46878 conn_id=22}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 22
fedimintd_1 | 2023-07-20T19:02:39.808029Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:40.746505Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:41.981746Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:42.034322Z INFO connection{remote_addr=<redacted>:38974 conn_id=23}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:42.525462Z ERROR connection{remote_addr=<redacted>:38974 conn_id=23}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 23
fedimintd_1 | 2023-07-20T19:02:42.945827Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:43.838556Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:45.226698Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:45.443084Z INFO connection{remote_addr=<redacted>:38976 conn_id=24}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:45.979793Z ERROR connection{remote_addr=<redacted>:38976 conn_id=24}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 24
fedimintd_1 | 2023-07-20T19:02:46.198558Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:47.126712Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:48.402344Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:48.420062Z INFO connection{remote_addr=<redacted>:38982 conn_id=25}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:48.993393Z ERROR connection{remote_addr=<redacted>:38982 conn_id=25}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 25
fedimintd_1 | 2023-07-20T19:02:49.413231Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:50.268860Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:51.638846Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:51.881008Z INFO connection{remote_addr=<redacted>:60856 conn_id=26}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:52.405201Z ERROR connection{remote_addr=<redacted>:60856 conn_id=26}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 26
fedimintd_1 | 2023-07-20T19:02:52.621806Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:53.557667Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:54.793772Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:54.843049Z INFO connection{remote_addr=<redacted>:60858 conn_id=27}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:55.191758Z WARN net::peer: Peer 0 has 10001 pending messages. Dropping new message.
(...)
fedimintd_1 | 2023-07-20T19:02:55.391306Z WARN net::peer: Peer 0 has 10001 pending messages. Dropping new message.
fedimintd_1 | 2023-07-20T19:02:55.391598Z ERROR connection{remote_addr=<redacted>:60858 conn_id=27}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 27
fedimintd_1 | 2023-07-20T19:02:55.811309Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:55.864001Z WARN net::peer: Peer 0 has 10001 pending messages. Dropping new message.
(...)
fedimintd_1 | 2023-07-20T19:02:56.276473Z WARN net::peer: Peer 0 has 10001 pending messages. Dropping new message.
fedimintd_1 | 2023-07-20T19:02:56.695335Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:56.742026Z WARN net::peer: Peer 0 has 10001 pending messages. Dropping new message.
(...)
fedimintd_1 | 2023-07-20T19:02:58.448011Z WARN net::peer: Peer 0 has 10001 pending messages. Dropping new message.
fedimintd_1 | 2023-07-20T19:02:58.448121Z INFO connection{remote_addr=<redacted>:60862 conn_id=28}: jsonrpsee_server::server: Accepting new connection 6/10
fedimintd_1 | 2023-07-20T19:02:58.846899Z ERROR connection{remote_addr=<redacted>:60862 conn_id=28}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 28
fedimintd_1 | 2023-07-20T19:02:58.869064Z INFO run_io_thread: net::peer: Replacing existing connection
fedimintd_1 | 2023-07-20T19:02:58.922201Z WARN net::peer: Peer 0 has 10001 pending messages. Dropping new message.
(...)
fedimintd_1 | 2023-07-20T19:02:59.075063Z WARN net::peer: Peer 0 has 10001 pending messages. Dropping new message.
fedimintd_1 | 2023-07-20T19:02:59.075453Z ERROR connection{remote_addr=<redacted>:34162 conn_id=2}: net::peer::dkg: Timeout waiting for dkg completion confirmation from other peers
fedimintd_1 | 2023-07-20T19:02:59.075876Z INFO connection{remote_addr=<redacted>:34162 conn_id=2}: net::peer: Distributed key generation has completed successfully!
fedimintd_1 | 2023-07-20T19:02:59.076230Z INFO connection{remote_addr=<redacted>:34162 conn_id=2}: timing: Operation timing name=distributed-gen duration_ms=47177
fedimintd_1 | 2023-07-20T19:02:59.079018Z INFO connection{remote_addr=<redacted>:34162 conn_id=2}: task: Waiting for task to finish task=io-thread-peer-0
fedimintd_1 | 2023-07-20T19:02:59.079722Z INFO run_io_thread: net::peer: Shutting down peer connection state machine peer=PeerId(0)
fedimintd_1 | 2023-07-20T19:02:59.081718Z INFO connection{remote_addr=<redacted>:34162 conn_id=2}: task: Task finished task=io-thread-peer-0
fedimintd_1 | 2023-07-20T19:02:59.082166Z INFO connection{remote_addr=<redacted>:34162 conn_id=2}: task: Waiting for task to finish task=listen task
fedimintd_1 | 2023-07-20T19:02:59.082599Z INFO connection{remote_addr=<redacted>:34162 conn_id=2}: task: Task finished task=listen task
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (15 by maintainers)
Commits related to this issue
- fix(peer): reconnection ping-pongs The root of all evil is state. Especially mutable and shared. Seems to me like peer communication is using a connection where both sides think they are in control o... — committed to dpc/fedimint by dpc a year ago
- fix(peer): reconnection ping-pongs The root of all evil is state. Especially mutable and shared. Seems to me like peer communication is using a connection where both sides think they are in control o... — committed to dpc/fedimint by dpc a year ago
Works for me!
Anyone with the affected setup would like to try https://github.com/fedimint/fedimint/pull/2841 ?
This got to be some logical error somewhere, not a performance issue. And if it’s performance it would rather be latency (some timeout somewhere).