hermes: Packets stuck due to overflowing Timeout
Crate
hermes relayer
Summary of Bug
Pending packets are stucked, can’t be relayed
[bitcanna-1:transfer/channel-0 -> osmosis-1] packets that still have commitments on bitcanna-1: 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 (first 10 shown here; total=53)
Oct 08 13:56:25.293 DEBUG ThreadId(01) [bitcanna-1:transfer/channel-0 -> osmosis-1] recv packets to send out to osmosis-1 of the ones with commitments on source bitcanna-1: 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 (first 10 shown here; total=52)
The application panicked (crashed).
Message: called `Result::unwrap()` on an `Err` value:
0: error converting from u64 to i64
1: out of range integral type conversion attempted
Location:
/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/flex-error-0.4.3/src/tracer_impl/eyre.rs:31
Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
Location: modules/src/ics04_channel/events.rs:139
Another logs:
ct 08 14:03:04 BCNA-SentryNode2 hermes[533309]: Oct 08 14:03:04.625 INFO ThreadId(46) [osmosis-1:transfer/channel-41 -> bitcanna-1] clearing pending packets height=Some(Height { revision: 1, height: 1500899 })
Oct 08 14:03:04 BCNA-SentryNode2 hermes[533309]: Oct 08 14:03:04.638 ERROR ThreadId(46) [osmosis-1:transfer/channel-41 -> bitcanna-1] worker: handling command encountered error:
Oct 08 14:03:04 BCNA-SentryNode2 hermes[533309]: 0: clearing of old packets failed
This error is in bucle in the relayer log:
Oct 08 13:09:22 BCNA-SentryNode2 hermes[533309]: Oct 08 13:09:22.894 ERROR ThreadId(46) [osmosis-1:transfer/channel-41 -> bitcanna-1] worker: handling command encountered error:
Oct 08 13:09:22 BCNA-SentryNode2 hermes[533309]: 0: clearing of old packets failed
Oct 08 13:09:22 BCNA-SentryNode2 hermes[533309]: Location:
Oct 08 13:09:22 BCNA-SentryNode2 hermes[533309]: /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/flex-error-0.4.3/src/tracer_impl/eyre.rs:10
Oct 08 13:09:22 BCNA-SentryNode2 hermes[533309]: Backtrace omitted.
hermes 0.7.3+5d53e52 (binary download). chain SDK v.0.44.1 / tendermint v.0.34.13
Steps to Reproduce
The stacked packets are there… stacked in the Osmosis chain. You can see at logs. To get the panic error:
hermes tx raw packet-recv osmosis-1 bitcanna-1 transfer channel-0
Oct 08 21:56:52.931 INFO ThreadId(01) using default configuration from '/home/raul/.hermes/config.toml'
Oct 08 21:56:52.934 TRACE ThreadId(01) [bitcanna-1] subscribing to query: tm.event = 'Tx'
Oct 08 21:56:52.935 TRACE ThreadId(01) [bitcanna-1] subscribing to query: tm.event = 'NewBlock'
Oct 08 21:56:52.935 TRACE ThreadId(01) [bitcanna-1] subscribed to all queries
Oct 08 21:56:52.936 DEBUG ThreadId(07) [bitcanna-1] starting event monitor
Oct 08 21:56:52.938 TRACE ThreadId(01) [osmosis-1] subscribing to query: tm.event = 'Tx'
Oct 08 21:56:52.939 TRACE ThreadId(01) [osmosis-1] subscribing to query: tm.event = 'NewBlock'
Oct 08 21:56:52.940 TRACE ThreadId(01) [osmosis-1] subscribed to all queries
Oct 08 21:56:52.940 DEBUG ThreadId(14) [osmosis-1] starting event monitor
Oct 08 21:56:53.019 DEBUG ThreadId(01) [bitcanna-1:transfer/channel-0 -> osmosis-1] packets that still have commitments on bitcanna-1: 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 (first 10 shown here; total=54)
Oct 08 21:56:53.019 DEBUG ThreadId(01) [bitcanna-1:transfer/channel-0 -> osmosis-1] recv packets to send out to osmosis-1 of the ones with commitments on source bitcanna-1: 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 (first 10 shown here; total=54)
The application panicked (crashed).
Message: called `Result::unwrap()` on an `Err` value:
0: error converting from u64 to i64
1: out of range integral type conversion attempted
Location:
/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/flex-error-0.4.3/src/tracer_impl/eyre.rs:31
Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
Location: modules/src/ics04_channel/events.rs:139
Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
Error:
0: link error
1: failed with underlying error
2: failed to receive through channel
3: receiving on an empty and disconnected channel
Location:
/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/flex-error-0.4.3/src/tracer_impl/eyre.rs:31
Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
Info of the client/connection/channel/path:
CHAIN CLIENT CONNECTION CHANNEL
bitcanna-1 07-tendermint-0 connection-0 channel-0
osmosis-1 07-tendermint-1456 connection-1141 channel-41
The healthy check:
hermes health-check
Oct 08 22:02:56.931 INFO ThreadId(01) using default configuration from '/home/raul/.hermes/config.toml'
Oct 08 22:02:56.933 INFO ThreadId(01) [bitcanna-1] performing health check...
Oct 08 22:02:56.958 WARN ThreadId(01) Health checkup for chain 'bitcanna-1' failed
Oct 08 22:02:56.958 WARN ThreadId(01) Reason: Hermes health check failed while verifying the application compatibility for chain bitcanna-1:http://localhost:9093/; caused by: SDK module at version '0.44.1' does not meet compatibility requirements >=0.41.3, <=0.44.0 for application bcnad:1.1-0740c191bf290b9a6a69976c4a90520e6d4fb7ce
Oct 08 22:02:56.958 WARN ThreadId(01) Some Hermes features may not work in this mode!
Oct 08 22:02:56.958 ERROR ThreadId(01) [bitcanna-1] chain is unhealthy
Oct 08 22:02:56.960 INFO ThreadId(01) [osmosis-1] performing health check...
Oct 08 22:02:56.982 INFO ThreadId(01) [osmosis-1] chain is healthy
Connection:
bcnad query ibc connection connections
connections:
- client_id: 07-tendermint-0
counterparty:
client_id: 07-tendermint-1456
connection_id: connection-1141
prefix:
key_prefix: aWJj
delay_period: "0"
id: connection-0
state: STATE_OPEN
versions:
- features:
- ORDER_ORDERED
- ORDER_UNORDERED
identifier: "1"
height:
revision_height: "128344"
revision_number: "1"
pagination:
next_key: null
total: "0"
unreceived-acks
bcnad query ibc channel unreceived-acks transfer channel-0 --sequences=11,12,13,30,54,65
height:
revision_height: "128392"
revision_number: "1"
sequences:
- "11"
- "12"
- "13"
- "30"
- "54"
- "65"
Acceptance Criteria
The relaying of the packets
For Admin Use
- Not duplicate issue
- Appropriate labels applied
- Appropriate milestone (priority) applied
- Appropriate contributors tagged
- Contributor assigned/self-assigned
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 21 (9 by maintainers)
Last test from Testnets with a new clean channel seems good: Also mixing relayers tech.
I think you can put this on production @romac & @adizere
Great to hear, thanks for the patience and perseverence with testing this Raul!
The channel in question in this error message is a Rust channel for cross-thread communication not an IBC channel. The Rust channel gets closed/disconnected after the overflow happens, because the latter crashes the underlying thread associated with that chain, so the relayer cannot send requests to that thread anymore, hence the error.
Hi Raul,
We have a candidate solution in PR #1458 – namely this dev branch. If you could run Hermes from that branch and provide us feedback that would be outstanding! Let us know if you encounter difficulties running it.
Some updates
bitcanna-1
network, thank you!bitcanna-1
from hardcodedchannel-0
I think we’ll just go ahead and reproduce using Anca’s solution, though it would be ideal if we still find the offending blockchain packet and confirm that the overflow is from there (we’re 99% sure it is).
I can reproduce this locally:
export RUST_BACKTRACE=full ; hermes tx raw ft-transfer ibc-1 ibc-0 transfer channel-0 9999 -t 18
(note the18
sec timeout will be overwritten tou64::MAX
)hermes start
and see the same when trying to extract the packet events.Thanks for ping me, the 2 StateSync Servers are restarted. I can’t config them as SEED servers so, when their tables are full, I need to restart them manually, is a limitation of StateSync… Ideally this servers should be Seed servers too.
Anyway you can bootstrap now:
Thank you Raul, Jacob for the thorough details on this issue!
It seems that one of the packets has a timeout set to a very large value. I did some preliminary debugging and found the largest value that Hermes can accept without croaking is
2262-04-11T23:47:16.854775807Z!
. Is it possible that the user specified a timeout larger than this? In any case, it is a problem that the chain accepts this value as valid, while Hermes does not accept it.We’ll need access to a bitcanna-1 full node to be able to recreate the problem and fix it. @RaulBernal would you mind sharing with us the Hermes config.toml, so we can access the same full node as in your setup? You can reach me at
adi@informal.systems
or on DiscordAdi Seredinschi (Informal)#9927
.