beast: tcp_stream causing SIGABRT | Boost 1.73.0

I am hitting a crash in my websocket server due to sigabrt caused by boost/beast/core/detail/stream_base.hpp:81

All the read/write/close operations are being done in the same strand on the io-context for this web socket server. Upon, Vinnie’s suggestion I replaced tcp_stream with tcp socket, and the crash went away. Tries with tcp_stream and only one single thread in io-context as well, and the crash happened again, implicating the issue is probably in tcp_stream.

I am attaching the stack trace file here, which has some additional info added in it. The crash occurrence has a timestamp as well, and if we look at step # 31, I have pasted timestamp at this point as well. The thread which crashed seems to be stuck somewhere for more than 3 minutes trying to do a write operation.

The crash is a bit rare but is reproducible under load. Even without significant load the program crashes sometimes with same stacktrace.

I cannot provide the whole transport layer here as the code is proprietary, but if needed we can figure out a way to work around that.

The program is running on CentOS Linux release 7.9.2009 (Core) compiled with gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3).

stack trace.txt

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 30 (4 by maintainers)

Most upvoted comments

@madmongo1 @mhassanshafiq This issue showed up again - just to give some context -

  1. There is just 1 IO thread for the context and multiple connections on it (ssl)
  2. This also seems to be triggered under load when one connection just installs a read handler, and from the stack I see the socket got like 4 bytes most likely a close frame of some sort

This is my understanding of the stack trace. I see close op being called and triggers an assert the same way the assert hits on wr_impl.is_locked check - clearly is not MT issue here.

There are some very sensitive scenarios where I see this happening. The same code with same scenario with less load doesnt trigger this assert

I understand its very difficult to go by just the symptoms we are stating here but there is some code path in the close.hpp which is clearly very delicate but would be great if this can be checked and fixed. Its like a time bomb ticking and I dont think its actually a beast stream issue but some logic in the close code path IMHO

Cheers

Try replacing beast::tcp_stream with boost::asio::ip::tcp::socket
instead, and comment out the timeouts. This will help you isolate the
problem. Maybe it is a bug in Beast? I doubt it, but you never know.

This is the suggestion I followed to work around the crash. @gopalak

just switch to an ssl stream with a regular ASIO socket, and implement the timeout yourself.