tokio: Stop runtime on task panic

Version

tokio 0.2.6

Description

I’m indirectly using tokio runtime with basic scheduler (through using actix 0.9.0). It seems like tokio 0.1 would stop if any task panics, but 0.2.6 catches everything in task::harness::Harness::poll and the runtime keeps going.

Is there any way to get the old behavior of stopping the runtime?

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 8
  • Comments: 16 (10 by maintainers)

Most upvoted comments

It’s not about tests. It’s about panics being silently caught everywhere. In production too.

Anywhere else in Rust, if there is a panic in the code not explicitly wrapped in catch_unwind, the whole program terminates with a diagnostic message. This goes in line with Rust’s emphasis on correctness. Panic usually indicated a bug in the code, and I don’t want bugs to be silently ignored. I want bugs to be reported and fixed.

It is true that sometimes we need to catch panics to ensure robustness. For example, perhaps we don’t want a panic in a request handler to terminate the whole web server program. But that’s none of tokio’s business! It’s web framework’s or even web application’s business! It is possible to use tokio for something besides web applications, and in those use cases panics definitely shouldn’t be silently ignored.

Consider reopening.

Having a way to tell Tokio to “not catch” panics that occur in its threads seems like a useful feature for me.

My use-case: I have my Rust program deployed in Kubernetes. When a panic occurs, I want my program to crash/completely-close, so that Kubernetes can notice the crash and perform its regular handling (eg. restarting the pod, unless it keeps crashing immediately, in which case back off for a while).

I looked through the source-code of Tokio, and could not find a way to directly achieve what I wanted. That said, here are some workarounds I have found.

Workaround 1

Enable Rust’s “abort on panic” setting.

You can do this by… A) Adding the following to your root Cargo.toml file, as seen here:

[profile.XXX]
panic = "abort"

B) Or, by adding -C panic=abort to the rustflags, as seen here.

You can control the granularity of the stack-traces logged to the console by setting the RUST_BACKTRACE environment variable:

RUST_BACKTRACE=0 # no backtraces
RUST_BACKTRACE=1 # partial backtraces
RUST_BACKTRACE=full # full backtraces

Workaround 2

Add a custom panic handler, which receives the error, prints a backtrace (optionally), and then manually aborts your program (optionally):

#![feature(backtrace)]

use std::backtrace::Backtrace;

#[tokio::main]
async fn main() {
    //panic::always_abort();
    panic::set_hook(Box::new(|info| {
        //let stacktrace = Backtrace::capture();
        let stacktrace = Backtrace::force_capture();
        println!("Got panic. @info:{}\n@stackTrace:{}", info, stacktrace);
        std::process::abort();
    }));

    [...]
}

I like this approach better because it gives me control of how much of the stacktrace to print (they can be quite long!), as well as whether the panic is of a type that is worth calling abort() for.

The one main drawback is that the backtrace-generation code (Backtrace.capture()) is currently only available on Rust nightly.


If you want to use the backtrace-generation on Rust stable, you can actually, but it requires a hack where you set this environment variable: RUSTC_BOOTSTRAP=1 (as described here)

You can set that as a global environment variable, or have it set specifically for your cargo-build command.

For Docker: Just add a ENV RUSTC_BOOTSTRAP=1 line before your build commands. (or use RUN RUSTC_BOOTSTRAP=1 <rest of command> for each command)

For rust-analyzer (in VSCode): Add this to your project’s .vscode/settings.json file:

    "rust-analyzer.server.extraEnv": {"RUSTC_BOOTSTRAP": "1"}

Dealing with it in the panic handler is not the best option because maybe I still want to explicitly catch panics in specific scopes, but unexpected panics elsewhere should terminate the whole thing. By default. It’s an unpleasant surprise when they don’t (see fail-fast).

No, there’s currently no way to do this.