actix-web: Panic within a handler should return an HTTP 500

(Apologies if this is already covered anywhere. I did look and couldn’t find anything, but I may very well have missed it!)

Expected Behavior

If anything goes catastrophically wrong within a handler, it would be good to return an HTTP 500 response to the client.

Even better would be the option to customize what the error response is.

Current Behavior

Currently, the server just hangs up the network connection. It does not seem to break the threading - I can make many more panicking requests to the server than it is running threads and they all connect just fine. I just don’t get a decent error.

Steps to Reproduce (for bugs)

Write a handler
Call - e.g. unwrap() on an Err value from within the handler

Context

There are certain classes of errors that are catastrophic in nature. For example, the database not being present. Adding deliberate error propagation for these scenarios only serves to bloat the codebase - all of a sudden I’ve got to propagate the errors all the way from the bottom layer up to the handlers in order to return an error.

Alternatively, I can just not care about these. Just call pool.get().await.unwrap() to get a connection from the connection pool, for example. If I get a connection then that’s great. If I don’t then I have a catastrophic failure on my hands anyway, so panicking isn’t unreasonable.

I could wrap every single handler in panic::catch_unwind() and handle it myself, but this is then a lot of duplication of effort, and has the need to return appropriate catastrophic errors from every handler. (I’m also not sure how to use catch_unwind in an async context, but that’s my problem 😃 )

Alternatively, if Actix-web can do this then it only needs to be done once, and can have guaranteed consistent responses no matter which handler had the problem.

Obviously, this should only apply to actual catastrophic failures. Business errors, validation errors, etc are the domain of the application and not of Actix-web and should remain as such.

Your Environment

Rust Version (I.e, output of rustc -V): rustc 1.45.0-nightly (a08c47310 2020-05-07)
Actix Web Version:

-> % cargo tree | grep actix
├── actix-cors v0.2.0
├── actix-http v1.0.1 (*)
├── actix-rt v1.1.1 (*)
├── actix-web v2.0.0 (*)

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 2
Comments: 26 (15 by maintainers)

Most upvoted comments

I’m sorry but I don’t suppose this proposal. Rust offers first-class error handling to manage this situation. Impress upon the authors of 3rd party libraries to manage errors rather than panic.

Dowwie on May 11, 2020

Did you check definition?

fafhrd91 on May 12, 2020

@fafhrd91,@onelson

Sure, but when the problem happen, the user get a “connection reset”, so the tech support blame it on the internet or something (totally believable on my case, my users roam the country) and not report it to me, instead of a “error 500: Internal server error” where now I sure the problem is in the server.

Again, this behavior is not what is expected. So look like I need to workaround to put things behind nginx (that look like a non issue for regular websites but are another complexity for deploy intranet things).

mamcx on May 12, 2020

I would argue that terminating a connection is a suitable response to a handler panic. You should endevor to employ proper error handling at all levels. Things that return a Result do so because their errors are recoverable and should be either handled or bubbled.

Using pool.get().await is certainly common in handler code. Taking advantage of the ? operator is the best and cleanest way to handle this pool.get().await? where your handler’s Error type impls From<PoolError> and can be converted to an error response (eg impl actix_web::ResponseError). There a plethora of error helper crates to make this pattern clean and easy to implement but using a plain global error enum with manual impls is also reasonably simple.

On catch_unwind, trying to employ a system that relies on catch_unwind has all sorts of gotchas including considerations of things that are or are not unwind-safe; it is not equivalent to try-catch in other languages that are built to handle clean up.

robjtede on May 11, 2020

Then you need to fix dB connector or api access code. Panic is not exception, you can have a lot of bad consequences after panic. For example leaked memory.

It is better to use “panic = abort” in your cargo.toml

fafhrd91 on May 12, 2020

I was about to write a thoughtful response to this, but https://github.com/actix/actix-web/issues/1501#issuecomment-626721902 pretty much sums up my point of view.

Signaling failure with Result is a win-win and is well supported. 90% of my error handling code comes in the form of an impl actix_web::ResponseError. At that point, my handler code uses ? to focus on the “happy path” and the rest is automatic.

Using things like catch_unwind in my mind is a last resort, probably most useful in stuff like test runners or what have you.

onelson on May 12, 2020

If you put rust code with panics into production then your have to use nginx, no options.

fafhrd91 on May 12, 2020

Such cases are exceeding rare in HTTP request handling, therefore Result should be used where possible.

Rare? I wish. Loss database connections, the database schema change, the upstream server/apis break (my world, all the time), etc is not rare at ALL. It happens in production and is handled corrected by ANY decent web server or router or whatever in the stack.

As I say, this lack of feature is VERY surprising to me and I think anyone that come from any other web framework below the sun. Where else this is like this?

I get if this in rust is harder. So what else can be done? Is possible to wrap the code, fork it then handle the crash in the fork or something like that? Actix MUST be put behind another web server like nginx for production? If is the case, put the idea in the docs is important (not assume everyone put nginx or similar in front of their APIs).

If rust is mean to be better at handle problems, then we can’t just say “let it crash” and not respond with the proper error 500 just because this. As I say, if a workaround must be implemented, is ok. But “blind” crash is not good.

mamcx on May 12, 2020

From std::panic::catch_unwind docs:

It is not recommended to use this function for a general try/catch mechanism.

From futures::future::FutureExt::catch_unwind docs:

It’s not recommended to use this for error handling.

From the Rust book ch 9.3:

When code panics, there’s no way to recover.

From the edition guide:

Panics thus abort execution up to some “isolation boundary”, with code on the other side of the boundary still able to run, and perhaps to “recover” from the panic in some very coarse-grained way.

Fundamentally, panicking is Rust’s solution to prevent memory errors/corruption and generally prevent invalid states in situations where recovery is not possible. Such cases are exceeding rare in HTTP request handling, therefore Result should be used where possible.

The “isolation boundary” should be well defined and not let states become invalid. In the case of actix-web, since workers are naturally isolated by their arbiter/thread, it is sensible for that boundary to be the thread and nothing more granular.

robjtede on May 12, 2020

It is simpler, Rc and Arc are not catch unwind

fafhrd91 on May 12, 2020

I appreciate that catch_unwind is not a trivial thing to go for, and that it’s not an equivalent to try/catch in other languages.

However, I’m also not suggesting that this is something to do lightly. You’re absolutely right that returning a Result and using the ? operator is correct behaviour most of the time. But I’m not convinced it’s correct behaviour all of the time.

Specifically I am talking about things that really are catastrophic failures. Things like talking to a database that just isn’t there. Not reading a database record that is not found, but the entire database being inaccessible.

And yes, this can be done using Result and propagating an error value. That works, but it gets very tedious and repetitive when you have a multi-layered architecture. For example, in my app right now I’m following the Clean Architecture pattern (Which works remarkably well in Rust), which means I have:

Handler function lookup_username() -> impl Responder
Calls UserService.lookup_username() -> bool
Calls UserRepository.find_user_by_username() -> Option<UserModel>
Calls Database.checkout() -> PooledConnection (Where Database is a wrapper around bb8)
Calls Pool.get() -> Result<PooledConnection, Error>

Right now, this works great. And the only way this can at all fail is in a catastrophic sense. If the database is missing, or the schema is corrupt. Anything else will cause a graceful result to be returned and all is good. (This design was based on reading around in blog posts, the Rust Book and The Rustonomicon.)

Now, I could return a Result here. That would mean that every step of the way I’ve got to have a Result type, with an appropriate Error type for that level - bearing in mind that other calls through could have business errors involved as well, so you’d now be mixing these catastrophic failures with normal business errors - and then have to handle it in every single handler function to correctly return an HTTP 500 indicating that something was seriously wrong.

As such, it seemed reasonable to have this level of handling be something that was more common and shared so that everyone can benefit automatically. 😃

Since these are the level of errors that means that everything is broken anyway, I’m not totally invested in whether it is decided to go ahead with this or not. If you still deem it better to leave it alone then you are significantly more experienced than I am and I bow to that wisdom 😃

Cheers

sazzer on May 11, 2020

~~I want to mention that the (at least with tokio::main), actix already seems to catch panics and continue on uninterrupted:~~ (I guess this is what the OP is talking about with “threading continues uninterrupted”)

actix seems barely affected by panics:

...
thread 'actix-server worker 0' panicked at 'called `Option::unwrap()` on a `None` value', /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/markdown-it-0.5.0/src/plugins/extra/smartquotes.rs:518:51
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
... (it just goes on here)

So far so good, but the issue with this is that if actix is behind nginx, the current behaviour of actix causes nginx to log these errors:

2023/07/05 11:46:30 [error] 2158443#2158443: *721527 upstream prematurely closed connection while reading response header from upstream, client: 47.149.14.66, server: lemmy.world, request: "GET /c/xxx HTTP/1.1", upstream: "http://0.0.0.0:8536/c/xxx", host: "lemmy.world"

which (by default) causes nginx to mark the upstream down for ten seconds, the same as it would if the upstream was actually down. This causes nginx to not serve requests to the server anymore even though it is still up. This causes a DOS because one panicking request handler effectively takes the server down for 10s . Sadly this behaviour isn’t even cleanly configurable in nginx.

So I’m not sure about the exact implications, but it would be great if actix would not instantly close the connection if one of my dependencies does something dumb. I don’t want a configurable error message, just a 500 response.

phiresky on Jul 5, 2023

I still think it’s better UX for something like this to return an Error 500 to the user if the invariant is broken:

        thumb_path.push(format!("{:x}.png", md5::compute(
            Url::from_file_path(path).expect("path should be absolute by this point").as_str())));

(It’s MD5 because it’s from an in-development implementation of the XDG Thumbnail Managing Standard for a miniserve-like tool for quickly sharing image sets with friends… a tool where the whole point is that you don’t need to set up something like nginx.)

ssokolow on Dec 31, 2020

@fafhrd91 Could you explain us why catch_unwind don’t work for actix so we all can learn something? Just curious.

Lesiuk on May 12, 2020

This is not matter of new feature. This is just not possible

fafhrd91 on May 12, 2020

I am writing my code without using any panics but some external libraries are using panics in some cases so returning 500 if it happens should be better idea than just closing connection. I will research this possibility (https://docs.rs/futures/0.3.5/futures/future/trait.FutureExt.html#method.catch_unwind) this weekend and maybe create my first PR.

Lesiuk on May 11, 2020