actix-web: Panic within a handler should return an HTTP 500
(Apologies if this is already covered anywhere. I did look and couldn’t find anything, but I may very well have missed it!)
Expected Behavior
If anything goes catastrophically wrong within a handler, it would be good to return an HTTP 500 response to the client.
Even better would be the option to customize what the error response is.
Current Behavior
Currently, the server just hangs up the network connection. It does not seem to break the threading - I can make many more panicking requests to the server than it is running threads and they all connect just fine. I just don’t get a decent error.
Steps to Reproduce (for bugs)
- Write a handler
- Call - e.g.
unwrap()
on an Err value from within the handler
Context
There are certain classes of errors that are catastrophic in nature. For example, the database not being present. Adding deliberate error propagation for these scenarios only serves to bloat the codebase - all of a sudden I’ve got to propagate the errors all the way from the bottom layer up to the handlers in order to return an error.
Alternatively, I can just not care about these. Just call pool.get().await.unwrap()
to get a connection from the connection pool, for example. If I get a connection then that’s great. If I don’t then I have a catastrophic failure on my hands anyway, so panicking isn’t unreasonable.
I could wrap every single handler in panic::catch_unwind()
and handle it myself, but this is then a lot of duplication of effort, and has the need to return appropriate catastrophic errors from every handler. (I’m also not sure how to use catch_unwind
in an async context, but that’s my problem 😃 )
Alternatively, if Actix-web can do this then it only needs to be done once, and can have guaranteed consistent responses no matter which handler had the problem.
Obviously, this should only apply to actual catastrophic failures. Business errors, validation errors, etc are the domain of the application and not of Actix-web and should remain as such.
Your Environment
- Rust Version (I.e, output of
rustc -V
): rustc 1.45.0-nightly (a08c47310 2020-05-07) - Actix Web Version:
-> % cargo tree | grep actix
├── actix-cors v0.2.0
├── actix-http v1.0.1 (*)
├── actix-rt v1.1.1 (*)
├── actix-web v2.0.0 (*)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 26 (15 by maintainers)
I’m sorry but I don’t suppose this proposal. Rust offers first-class error handling to manage this situation. Impress upon the authors of 3rd party libraries to manage errors rather than panic.
Did you check definition?
@fafhrd91,@onelson
Sure, but when the problem happen, the user get a “connection reset”, so the tech support blame it on the internet or something (totally believable on my case, my users roam the country) and not report it to me, instead of a “error 500: Internal server error” where now I sure the problem is in the server.
Again, this behavior is not what is expected. So look like I need to workaround to put things behind nginx (that look like a non issue for regular websites but are another complexity for deploy intranet things).
I would argue that terminating a connection is a suitable response to a handler panic. You should endevor to employ proper error handling at all levels. Things that return a Result do so because their errors are recoverable and should be either handled or bubbled.
Using
pool.get().await
is certainly common in handler code. Taking advantage of the ? operator is the best and cleanest way to handle thispool.get().await?
where your handler’sError
type implsFrom<PoolError>
and can be converted to an error response (egimpl actix_web::ResponseError
). There a plethora of error helper crates to make this pattern clean and easy to implement but using a plain global error enum with manual impls is also reasonably simple.On catch_unwind, trying to employ a system that relies on catch_unwind has all sorts of gotchas including considerations of things that are or are not unwind-safe; it is not equivalent to try-catch in other languages that are built to handle clean up.
Then you need to fix dB connector or api access code. Panic is not exception, you can have a lot of bad consequences after panic. For example leaked memory.
It is better to use “panic = abort” in your cargo.toml
I was about to write a thoughtful response to this, but https://github.com/actix/actix-web/issues/1501#issuecomment-626721902 pretty much sums up my point of view.
Signaling failure with Result is a win-win and is well supported. 90% of my error handling code comes in the form of an
impl actix_web::ResponseError
. At that point, my handler code uses?
to focus on the “happy path” and the rest is automatic.Using things like
catch_unwind
in my mind is a last resort, probably most useful in stuff like test runners or what have you.If you put rust code with panics into production then your have to use nginx, no options.
Rare? I wish. Loss database connections, the database schema change, the upstream server/apis break (my world, all the time), etc is not rare at ALL. It happens in production and is handled corrected by ANY decent web server or router or whatever in the stack.
As I say, this lack of feature is VERY surprising to me and I think anyone that come from any other web framework below the sun. Where else this is like this?
I get if this in rust is harder. So what else can be done? Is possible to wrap the code, fork it then handle the crash in the fork or something like that? Actix MUST be put behind another web server like nginx for production? If is the case, put the idea in the docs is important (not assume everyone put nginx or similar in front of their APIs).
If rust is mean to be better at handle problems, then we can’t just say “let it crash” and not respond with the proper error 500 just because this. As I say, if a workaround must be implemented, is ok. But “blind” crash is not good.
From
std::panic::catch_unwind
docs:From
futures::future::FutureExt::catch_unwind
docs:From the Rust book ch 9.3:
From the edition guide:
Fundamentally, panicking is Rust’s solution to prevent memory errors/corruption and generally prevent invalid states in situations where recovery is not possible. Such cases are exceeding rare in HTTP request handling, therefore Result should be used where possible.
The “isolation boundary” should be well defined and not let states become invalid. In the case of actix-web, since workers are naturally isolated by their arbiter/thread, it is sensible for that boundary to be the thread and nothing more granular.
It is simpler, Rc and Arc are not catch unwind
I appreciate that
catch_unwind
is not a trivial thing to go for, and that it’s not an equivalent to try/catch in other languages.However, I’m also not suggesting that this is something to do lightly. You’re absolutely right that returning a
Result
and using the?
operator is correct behaviour most of the time. But I’m not convinced it’s correct behaviour all of the time.Specifically I am talking about things that really are catastrophic failures. Things like talking to a database that just isn’t there. Not reading a database record that is not found, but the entire database being inaccessible.
And yes, this can be done using
Result
and propagating an error value. That works, but it gets very tedious and repetitive when you have a multi-layered architecture. For example, in my app right now I’m following the Clean Architecture pattern (Which works remarkably well in Rust), which means I have:lookup_username() -> impl Responder
UserService.lookup_username() -> bool
UserRepository.find_user_by_username() -> Option<UserModel>
Database.checkout() -> PooledConnection
(WhereDatabase
is a wrapper around bb8)Pool.get() -> Result<PooledConnection, Error>
Right now, this works great. And the only way this can at all fail is in a catastrophic sense. If the database is missing, or the schema is corrupt. Anything else will cause a graceful result to be returned and all is good. (This design was based on reading around in blog posts, the Rust Book and The Rustonomicon.)
Now, I could return a
Result
here. That would mean that every step of the way I’ve got to have a Result type, with an appropriate Error type for that level - bearing in mind that other calls through could have business errors involved as well, so you’d now be mixing these catastrophic failures with normal business errors - and then have to handle it in every single handler function to correctly return an HTTP 500 indicating that something was seriously wrong.As such, it seemed reasonable to have this level of handling be something that was more common and shared so that everyone can benefit automatically. 😃
Since these are the level of errors that means that everything is broken anyway, I’m not totally invested in whether it is decided to go ahead with this or not. If you still deem it better to leave it alone then you are significantly more experienced than I am and I bow to that wisdom 😃
Cheers
I want to mention that the (at least with tokio::main), actix already seems to catch panics and continue on uninterrupted:(I guess this is what the OP is talking about with “threading continues uninterrupted”)actix seems barely affected by panics:
So far so good, but the issue with this is that if actix is behind nginx, the current behaviour of actix causes nginx to log these errors:
which (by default) causes nginx to mark the upstream down for ten seconds, the same as it would if the upstream was actually down. This causes nginx to not serve requests to the server anymore even though it is still up. This causes a DOS because one panicking request handler effectively takes the server down for 10s . Sadly this behaviour isn’t even cleanly configurable in nginx.
So I’m not sure about the exact implications, but it would be great if actix would not instantly close the connection if one of my dependencies does something dumb. I don’t want a configurable error message, just a 500 response.
I still think it’s better UX for something like this to return an Error 500 to the user if the invariant is broken:
(It’s MD5 because it’s from an in-development implementation of the XDG Thumbnail Managing Standard for a miniserve-like tool for quickly sharing image sets with friends… a tool where the whole point is that you don’t need to set up something like nginx.)
@fafhrd91 Could you explain us why catch_unwind don’t work for actix so we all can learn something? Just curious.
This is not matter of new feature. This is just not possible
I am writing my code without using any panics but some external libraries are using panics in some cases so returning 500 if it happens should be better idea than just closing connection. I will research this possibility (https://docs.rs/futures/0.3.5/futures/future/trait.FutureExt.html#method.catch_unwind) this weekend and maybe create my first PR.