ockam: Diagnose `Cannot drop a runtime ...` error in rust nodes
This error shows up often
thread 'tokio-runtime-worker' panicked at 'Cannot drop a runtime in a context where blocking is not allowed. This happens when a runtime is dropped from within an asynchronous context.', /Users/mrinal/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/blocking/shutdown.rs:51:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
We love helping new contributors! If you have questions or need help as you work on your first Ockam contribution, please leave a comment on this discussion. _If you’re looking for other issues to contribute to, checkout issues labeled - https://github.com/build-trust/ockam/labels/good first issue or https://github.com/build-trust/ockam/labels/help wanted_
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 43 (43 by maintainers)
@antoinevg and @mrinalwadhwa thank you so much! especially many thanks to @SanjoDeundiak who helped me in the internal details of how ockam works, and @twittner who actually fixed the issue!
Awesome work @pro465! 🏅
Thank you @pro465 👏
@pro465
Contextis a pretty basic building block in our Worker system. So,Workers andProcessors are some entities that run in the background. Each has correspondingWorkerRelayorProcessorRelayto actually run it. AndContextis an entity responsible for interacting with a runtime/sending messages. EachContextinstance has its own reference to the runtime and an address. EachWorkerandProcessorhas their instance ofContextthat they own, also you can createdetachedContextinstances, that are not attached to anyWorkerorProcessor. That been said, becauseContextis so basic block, it’s probably created many times during even simplest scenario (likeockam node list), and many of those instances are dropped during node shutdown, which is where probably issue is Is that helpful?Hey @pro465 , I think the bug is in
ockam_nodeand happens sometimes during node shutdown, it doesn’t depend on which command you run, because almost all commands:Also, I think it’s connected to how we implement
DropforContexttype, it has some tricks becauseDropis sync, and we needed to do async stuff there. Could you please take a look there?im not sure if it will work, but i’ll share a potential solution:
(and removing the
send_stop_ackcall in therunfn, of course)@mrinalwadhwa i think i’ve found the problem
@SanjoDeundiak thank you for the detailed documentation! and yes its helpful
WorkerRelay::runis also sometimes involved: https://github.com/pro465/ockam/runs/7539189372?check_suite_focus=true#step:6:527(note: both are from the same GA log)
https://github.com/pro465/ockam/runs/7539189372?check_suite_focus=true#step:6:558
ProcessorRelay::runis (sometimes?) involved, altho i could not find where is it called in following the functions called byockam node list. perhaps it is the issue in the background-runningockam node create?@mrinalwadhwa @SanjoDeundiak i think i have found the issue:
Executor::execute’sblock_futurecall to “join” user code does not seem to be working, and so the user code can continue executing even after the executor is dropped. when theContextis dropped in those user code, since it is in an async context AND the executor is already dropped (i.e, it is the last one to have the reference toArc<Runtime>), it panics.evidence: https://github.com/pro465/ockam/runs/7518405635?check_suite_focus=true#step:6:359
hi I’m back! you probably don’t remember me but i contributed before at least 1 time 😃
anyways, can i get a backtrace? (cuz compiling this will take a looong time)