crossbeam: Panic on `send` in channels by default?
Hi! This is somewhat of a continuation of “should dropping receiver close the sender” issue. I’ve migrated rust-analyzer to crossbeam-channel 0.3, and the thing I’ve noticed is that every .send
is followed by .unwrap
. Perhaps we should make this unwrapping behavior the default, and introduce a separate checked_send
which returns a Result
?
Sending to a channel without receivers seems to indicate a programmer’s error in the majority of cases: there’s either a bug in communication structure such that receivers exit too early (direct bug), or a reciver is closed due to a panic (indirect bug). In Rust it is OK to panic on bugs by default, while providing checked alternatives:
- indexing a vector with
[]
panic on out of bounds, and has.get
for check access - in debug, integer overflow panics and has
checked_add
Returning a result by default seems to be bad:
- it is confusing and is not a “pit of success”: with
Result<T, SendError>
the user has choices:- forward error (boilerplate which just moves problem to the wrong place)
- handle error, but correct handling is impossible in most cases
- ignore error, which is OK (
0.2 semanticsEDIT: not exactly, 0.2 blocked), but not optimal: it’s better to see a panic than a deadlock - unwrap: this I think is correct (propagates bugs), but is a hard choice to make, especially for novice user, because “unwraps are bad”
- it significantly devalues unwrap: unwrap is indeed bad in a lot of places, and its much easier to spot one during code review if there are fewer of them.
- unwrap happens in the wrong place. If we panic inside crossbeam, we can set a custom message of “sending to a closed channel” instead of a generic one “unwrapped Err”.
As a counter point, a similar API is exposed by std’s mutexes. They return Result
s by default, and almost all uses of .lock()
are followed by .unwrap/.expect
. I would argue that this design is also less ideal, and that std should have panicked by default, providing checked_lock
methods for rare cases when you do want to handle this somehow.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 3
- Comments: 24 (6 by maintainers)
(I’m posting this here at @BurntSushi’s suggestion, based on our discussion on Reddit. Note that a lot of this comment is in the context of
tokio::mpsc
, or eventokio::mpsc
channels being used as streams, and so this may not always directly apply tocrossbeam::mpsc
. This is submitted as anecdotal evidence. Please take it with a grain of salt.)At Faraday, we mostly work with
tokio
channels. And in almost every case, I’ve found that code callingsend
must make an explicit decision about how to handle failedsend
calls, and that panicking is virtually always the wrong answer.Let me back up and provide a bit more context. Most of the channels and streams that I see at work are ultimately writing to either:
1. A network socket with a
tokio::codec
, where either of the ends of the codec may be attached to a channel.We have a partial open source example of this, in our public fork of
rust-amqp
that usestokio
to implement true bidirectional channels. As you can see, we check the result of every call tosend
in this code, and either propagate it to our caller or log a warning.Every one of these
send
failures can be see in our logs during a typical week, and it would be unacceptable for any of them topanic
. A lost AMQP server connection might mean that the caller needs to either (a) reconnect, or (b) log a high-priority error indicating that a queue message has failed. The key insight here is that all of our channels are ultimately mapped to an underlying network socket, which can close unexpectedly with anECONNRESET
. And this maps directly to asend
error.2. Channels being used to emulate Unix pipes.
These channels typically have a structure which conceptually looks something like this:
The code in
data_producer
will normally contain something like:There’s generally also a
Context
type involved somewhere:The corresponding
mpsc::Receiver<Error>
is monitored by a supervisor task. We don’t have a good open source example of this yet, but you might take a look at theContext
type in our unfinisheddbcrossbar
tool. For the corresponding channel senders, seecopy_reader_to_stream
(forAsyncRead
) orSyncStreamWriter
(forWrite
).Going back to our example:
If an error occurs in, say,
data_sink
, thendata_sink
will report the error onerror_channel
, and drop the receiver of the channel it uses to read fromdata_transformer
. This will cause the broken pipe to propagate back throughdata_transformer
anddata_producer
. This is exactly analogous to the traditional UnixEPIPE
/SIGPIPE
, which Unix uses to shut down pipe writers when a pipe reader fails unexpectedly.In fact, it’s actually really hard to shut down a loosely-coupled pipeline like this without using something like
EPIPE
to propagate the error. In the general case,data_producer
is living off in its own world, and it may look something like this code, which uses PostgreSQLcopy_out
to take a data stream from PostgreSQL and forward it directly to atokio::mpsc::Sender<BytesMut, _>
:Here, the actual
send
call is occurring deep insidestmt.copy_out
, most of which is single-threaded C code. So there’s no way to tell PostgreSQL to stop writing data onto that stream except by returning a Unix I/O error, and the classic error in this case isEPIPE
.Outside example: Servo
According to this comment by u/asajeffrey on r/rust, Servo has found that
send(...).unwrap()
is tricky to get right:This mirrors my experience: Many
send
failures only occur in exceptional circumstances, such as when an AMQP network connection needs to be restarted, or when the program is shutting down. But if I don’t want my program topanic
randomly in the middle of an “orderly” shutdown, then I need to carefully think through what to do about each failedsend
. That may be as simple as logging it witherror!
so that I can debug it when it starts failing in production 6 months from now.Outside example: Go channels
I don’t write much Go code myself, but I’ve occasionally needed to debug large Go applications, including Vault and Pachyderm. In my limited experience, the code involved in shutting down multiple goroutines often involves subtle channel-related bugs. (And these goroutine shutdowns may occur every time a connection to a server is lost and needs to be recreated, so
abort
is not an acceptable solution.)For an academic example, see this paper, which includes bugs like:
If you look at this closely, it’s a classic
EPIPE
error. Specifically, the parent routine has exited, and the child routine is trying to write to a dead channel. The authors suggest fixing this by adding a buffer toch
, so that the child can write the result to the buffer, which will never be read. Personally, I would be happier if the child’ssend
function failed outright withEPIPE
, and the child was forced to figure out how to handle the error. Perhaps the child should just silently quit, or maybe it should log a message, or maybe it needs to use something likeContext::error_channel
to report the problem to a supervisor. However, one clearly wrong answer in this case would be for the child’ssend
toabort
the entire server process.Note that if the child process were sending multiple messages (perhaps data chunks), the buffered channel workaround wouldn’t be enough, and the only easy solution would be for
send
to fail with an error.Summary
At least in my experience with
tokio
channels (and my more limited experience with Go channels), a majority ofsend
calls can fail. This is often triggered when a network socket closes (ECONNRESET
) or a pipe reader closes early (EPIPE
). This may happen during application shutdown (as in the Servo case) or during a connection restart (ourrust-amqp
fork).Because of this experience, my standard coding guiding is assume every
send
might fail, and decide how to handle it appropriately. 90% of the time, this may mean writing the error to a log somewhere and then giving up, or reporting it to a supervisory channel. But some decision needs to be made. So if there were a#![deny(clippy::unchecked_send)]
lint, I would definitely use it as part of our company coding style.Now, I’ll be the first to admit that I’m dragging in examples from
tokio::mpsc
and even Go here. So maybe none of this applies directly to typical uses ofcrossbeam::mpsc
! But I hope this explains why I considersend(...).unwrap()
to be giant warning sign, especially once networks or pipes are involved. Basically,ECONNRESET
andEPIPE
both map pretty directly to a failedsend
. And so I’ve wound up distrustingunwrap()
on channels almost as much I distrust it on function calls.(Actually, I’m really curious about how I could design this kind of code so that
send
s could never fail! Maybe I’ll learn something fascinating and useful today.)Thank you so much for these examples, @emk! I think they fully apply to crossbeam-channel.
I think the most important case here is failable pipeline:
producer | transformer | consumer
, where consumer might legitimately fail due to external error. For example,consumer
might be writing results to a database, and connection might die withio::Error
.Propagating
send
errors back from theconsumer
helps to tear down the pipeline, but several aspects of this approach are sub-optimal:consumer
ortransformer
panics (that is, it has a bug),producer
should panic by default as well.If we want to maintain unidirectional dataflow (that is, treat closed sender as a bug), a possible solution would be for
consumer
, upon encountering anio::Error
, to send it via an additionalerrors
channel to the entity that spawned the pipeline in the first place, and then to error-log all pending messages. That is, something like this:The entity than can shut-down the pipeline at the beginning, by terminating the producer. In the database connection example we can be even fancier, and instead of logging all pending messages, we can send both the
io::Error
and theReceiver
to the supervisor, which could create a new database connection and spawn a newconsumer
, without loosing the messages.I do understand that this is a bit of “architecture astronaut” type of the solution, and I haven’t actually tired in in practice, so I can’t say if it’ll work. Though remarkably the
Context
type you’ve mentioned seems to be a core part of this solution as well.The second example is “panicking during shutdown”. Here I think handling errors on send produces an objectively less than ideal solution: namely, messages are implicitly lost during shutdown. Maybe it’s not a big deal, but maybe it’s saving some user state to the database? The solution here seems to be to architecture shutdown sequence in such a way that you terminate
producers
first, wait for them to finish, and only after terminate consumers.This is a situation where “panicking by default on send” might nudge you to a better architecture: if dropping receiver while sender is alive is an error, than you’ll have to arrange a clean shutdown sequence in topological order. If send returns a result, the path of least resistance is to add logging on send.
Finally, the Go example also seems like the case where result on send is worse solution. Specifically, we have a parent coroutine and a child coroutine, and the problem is that, although parent has spawn the child, it’s not interested in child’s result for some reason. The proposed solution allows for the situation where the parent has finished, but the child is alive. That is, the child is effectively a zombie: it might hold onto some resources and it might do computations, but they are not interesting to anyone, and child will only know that once it attempts send. I think the better fix would be to make sure that parent just always joins the child in the end, even if it doesn’t need the result anymore.
All that said, I am now not sure that panicking on send is the right choice for crossbeam =) I personally am still convinced that this is the right model to think about channels, but I also see that there’s a large existing body of code which relies on handling errors on send.
If I understood everyone correctly, it seems no one strongly believes we should change the signature of
send
anymore. With that, I am closing the issue.Thank you all for the great discussion!
Awesome exchange! Thanks so much to both of you for writing all that stuff out. It’s a lot of work!
I just wanted to respond to a teeny detail:
I think if this were the only concern, we could say, “in crossbeam-channel 1.0, if your code depended on send returning an error, you should use
checked_send
instead.” This is a viable migration since uses ofsend
that check its error will fail to compile under your proposal. So I just mean to say that there does exist a reasonable migration path. Of course, this isn’t to say that this is the only concern!This is
Err
onrecv
, which is totally valid and required for shutdown. Perhaps you’ve meant to show code with error onsend
instead?@asajeffrey would be cool to dig into these cases more. At least from my experience with programming with channels in a medium-sized code base, there are usually several ways to arrange communication between parties.
There’s a solution when you explicitelly code for shut-down, and it sort-of works for planned shutdown, but fails miserably on unexpected panic.
And there’s a solution where you just rely on the ownership semantics and “close on drop” behavior, where shutdown happens automatically and works the same for panicing and non-panicking cases.
Panic on send definitely nudges me towards architectures of the second kind. I don’t know if this transfers to the servo-sized codebases, so that’s why it’s interesting to see concrete cases here. It maybe the case that large codebases just have different constraints, but it also could be the case that channel ownership in Servo is not exactly ideal 😃
Here’s a specific case where I believe panicking helped me to write more robust code:
In rust-analyzer, I have a virtual file system, whose task is to maintain a consistent snapshot of filesystem state, while watching for changing files.
The interface to watcher/scanner “actor” of VSF (which is a separate thread) is a pair of crossbeam channels channels: the actor receives a directory to watch, scans it using walkdir and then watches for changes. Results of the scan/watching are send to the output channel, OUT.
The file watching library I am using,
notify
, exposes notifications asstd::mpsc
channel, so I have to burn an additional thread to convert between the channels. Initially, I was going to send events directly to OUT, with the following code:However, I realized that the second
Err
case happens because I fail to join this event-transforming thread when shutting down the VFS itself. That is, I was going to leak a thread during shutdown.So I refactored the code to make this helper thread and the VFS actor thread essentially scoped threads, which are guaranteed to join on exiting the scope (due to normal termination,
?
, or panic). What guided me through this refactoring was realization that.send
should never fail, which means that client should outlive VFS and VFS should outlive the helper thread.The matching code for send is either the same, or just
let _ = chan.send(v);
depending on what we want to do with the warning, and whether we want to continue in the case that the receiver has closed.There are two cases where we need to ensure no panicking. The first is in any code which may run in panic-handlers, to avoid double-panic killing the entire process. The second is in any code which does crash-handling, e.g. by displaying a crash page which invites the user to submit an issue report. In particular the constellation (which manages the content threads) and webrender (which is responsible for painting) have to be robust against panic, even if content threads are panicking.
If I ruled the world then we’d have a
deny(panic)
attribute we could apply to functions where their robustness is important, but unfortunately it’s not obvious how to interface this with (e.g.) generic types without ending up going full out into effect systems, effect polymorphism, effect algebras etc. etc. So we end up with lints that just ban syntax like.unwrap()
.Err, “forked” is not the right word - “wrapped” is a better one. Relevant comments:
A bit later they updated
crossbeam-channel
to 0.3 and deletedservo-channel
: https://github.com/servo/servo/pull/22142Thanks, I will go around bugging people! 😃
Sorry, my mistake! 😦
I really had
try_send()
in mind, which can fail because the channel is either full or disconnected. So if we madetry_send()
panic when the channel is disconnected, it’d still need to return aResult
because the channel could be full.But anyway, I see now you weren’t suggesting to change
try_send()
at all…Ah, nice! I haven’t thought of that one. 😉
A bit off-topic, but this is one of the situations where I’d love the libs team to weigh in with their opinion. I want to make decisions that will overall make most people happy, but it’s kinda overwhelming since there are so many disagreeing opinions.
There was even a point in time where the Servo folks didn’t like the interface of
crossbeam-channel
v0.2 so much they just forked the library. Even a few third party crates with minor API tweaks popped up.Your analysis of the previous discussions is correct. And the reasoning for
send()
/checked_send()
is totally sound - I honestly don’t have any strong counterpoints.Now, the current API is perfect in many ways except two things might get tiresome: unwrapping on send and the sender/receiver split. Both are sometimes annoying, but not deal breakers, I think.
If we automatically panic on
send
, that annoyance will go away but then we need to deal withchecked_send
inselect!
and the semantics of selection + panicking get a little bit more complicated. Some other oddities might crop up and I’m not sure it’s a clear win in the end.But again, I do acknowledge the benefits of the proposed solution, it’s just that there are tradeoffs and I don’t feel confident it’d be a total success. 😦
Hi there!
The previous discussion @BurntSushi mentions is: https://github.com/crossbeam-rs/crossbeam-channel/issues/61
A couple points on this issue:
Yes,
unwrap()
onsend()
is similar tounwrap()
on mutexlock()
. It’s debatable what to do when disconnected/poisoned. Do we return aResult
, do we send/lock anyway, do we panic? Honestly, this is just something people will disagree on and no choice will make everyone happy.It’s true that
[]
and+
panic, butsend()
is not an operator - it’s a method and therefore more likeget()
andchecked_add()
. 😃 But this might be a bit silly point on my part and I see what you’re saying.There is the precedent of
send()
returning aResult
in all other channel crates exceptchan
. And channel crates already have very minor but annoying API differences.Even if
send()
panics when the channel is disconnected, we still need to return aResult
because the channel might be full (if it’s bounded). We could get around that by splittingSender
intoBoundedSender
andUnboundedSender
, and thenUnboundedSender::send()
would return()
.But then we’d probably implement some kind ofEDIT: BySender
trait for bothBoundedSender
andUnboundedSender
so that they can be passed toSelect::send()
… It gets messy. 😃send()
I actually meanttry_send()
.To summarize, I think
send(foo).unwrap()
is not bad, just mildly annoying. And all the other options seem to be worse than this. API design is hard. 😦