seastar: Crash/assert in seastar::aio_general_context::flush(): Assertion `clock::now() < retry_until' failed.

Hi folks,

in Redpanda we experience the following assert/crash under heavy load:

seastar::aio_general_context::flush(): Assertion `clock::now() < retry_until' failed.

https://github.com/scylladb/seastar/blob/master/src/core/reactor_backend.cc#L349-L356

[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/util/backtrace.hh:60
 (inlined by) seastar::backtrace_buffer::append_backtrace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:786
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:816
 (inlined by) seastar::print_with_backtrace(char const*, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:828
seastar::sigabrt_action() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4065
 (inlined by) operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4041
 (inlined by) __invoke at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4037
/opt/redpanda/lib/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=6e7b96dfb83f0bdcb6a410469b82f86415e5ada3, for GNU/Linux 3.2.0, stripped

__sigaction at ??:?
pthread_key_delete at ??:?
raise at ??:?
abort at ??:?
?? ??:0
__assert_fail at ??:?
seastar::aio_general_context::flush() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor_backend.cc:338
 (inlined by) seastar::reactor_backend_aio::kernel_submit_work() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor_backend.cc:522
seastar::reactor::poll_once() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3407
 (inlined by) operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3300
 (inlined by) seastar::reactor::do_run() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3328
operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4522
 (inlined by) decltype ((std::declval<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&>())()) std::__1::__invoke[abi:v160004]<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:394
 (inlined by) void std::__1::__invoke_void_return_wrapper<void, true>::__call<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:487
 (inlined by) std::__1::__function::__alloc_func<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2, std::__1::allocator<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2>, void ()>::operator()[abi:v160004]() at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:185
 (inlined by) std::__1::__function::__func<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2, std::__1::allocator<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2>, void ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:356
std::__1::__function::__value_func<void ()>::operator()[abi:v160004]() const at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:510
 (inlined by) std::__1::function<void ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:1156
 (inlined by) seastar::posix_thread::start_routine(void*) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/posix.cc:73
pthread_condattr_setpshared at ??:?
__xmknodat at ??:?

I don’t have an easy reproducer but I can reliably reproduce it with one of our internal load tests if further debugging is needed.

This bit of code seems to have a bit of history and was already changed a couple times:

I am not sure I fully understand the code. The comment mentions an “infinite loop” but I guess what it means is a longer stall in flush? I assume the 1 second was chosen arbitrarily? IMO stalling for 1 (or 5 or 10) seconds is better than crashing?

cc @bhalevy

Thanks, Stephan

About this issue

Original URL
State: open
Created a year ago
Comments: 22 (20 by maintainers)

Commits related to this issue

aio_general_context: flush: return short flush on error As seen in https://github.com/scylladb/seastar/issues/1784, having flush retry `io_submit` in a loop might be futile if the kernel internal que... — committed to bhalevy/seastar by bhalevy a year ago
aio_general_context: flush: return short flush on error As seen in https://github.com/scylladb/seastar/issues/1784, having flush retry `io_submit` in a loop might be futile if the kernel internal que... — committed to bhalevy/seastar by bhalevy a year ago
aio_general_context: flush: return short flush on error As seen in https://github.com/scylladb/seastar/issues/1784, having flush retry `io_submit` in a loop might be futile if the kernel internal que... — committed to redpanda-data/seastar by bhalevy a year ago
aio_general_context: flush: return short flush on error As seen in https://github.com/scylladb/seastar/issues/1784, having flush retry `io_submit` in a loop might be futile if the kernel internal que... — committed to redpanda-data/seastar by bhalevy a year ago
aio_general_context: flush: return short flush on error As seen in https://github.com/scylladb/seastar/issues/1784, having flush retry `io_submit` in a loop might be futile if the kernel internal que... — committed to redpanda-data/seastar by bhalevy a year ago
aio_general_context: flush: return short flush on error As seen in https://github.com/scylladb/seastar/issues/1784, having flush retry `io_submit` in a loop might be futile if the kernel internal que... — committed to graphcareful/seastar by bhalevy a year ago
aio_general_context: flush: return short flush on error As seen in https://github.com/scylladb/seastar/issues/1784, having flush retry `io_submit` in a loop might be futile if the kernel internal que... — committed to graphcareful/seastar by bhalevy a year ago
aio_general_context: flush: return short flush on error As seen in https://github.com/scylladb/seastar/issues/1784, having flush retry `io_submit` in a loop might be futile if the kernel internal que... — committed to redpanda-data/seastar by bhalevy a year ago
aio_general_context: flush: return short flush on error As seen in https://github.com/scylladb/seastar/issues/1784, having flush retry `io_submit` in a loop might be futile if the kernel internal que... — committed to redpanda-data/seastar by bhalevy a year ago

Most upvoted comments

@avikivity - here’s a more concrete repro to back the claim that N iocbs is not enough to support N sockets (nor N-C sockets for small C).

https://github.com/travisdowns/seastar/tree/td-eagain-repro

Run the server.py in the second commit like ./server.py, then to reproduce the clock::now() < retry_until assert run:

ninja socket_test && DO_SLEEP=1 SOCKET_COUNT=1000 tests/unit/socket_test -t open_n -- --reactor-backend=linux-aio --smp=1 --memory=1G --max-networking-io-control-blocks=1000

This tries to create 1000 sockets with --max-networking-io-control-blocks also set to 1000 and each socket is written to, read from and has a wait_for_input_shutdown callback attached.

It fails after 2/3rds have been created. Each socket consumes 3 iocbs. The reason it fails after 2/3 and not 1/3 have been created is that the kernel somewhat arbitrarily multiples the number of supported iocbs you request in io_setup by 2. However, we should not rely on it and in any case 2 is still not enough as this test case shows.

It also fails with smaller numbers, however because the kernel rounds up iocb counts to fill a full page there is a lot of quantization and it may pass for some values (e.g., with 10 sockets it will pass).

If you run it with DO_SLEEP=0 instead, it will reproduce the last < end crash after 1/3 of the sockets have been created, as we exceed the fixed-size array used to track enqueued iocbs. In our fork @StephanDollberg has also applied a fix for this issue: https://github.com/redpanda-data/seastar/pull/87/commits/d9175fcf84d31f21c0b72d0bea405f73ac1009b4 and that’s something we’d like to upstream too.

ninja socket_test && DO_SLEEP=1 SOCKET_COUNT=1000 tests/unit/socket_test -t open_n -- --reactor-backend=linux-aio --smp=1 --memory=1G --max-networking-io-control-blocks=1000

cc @bhalevy

travisdowns on Nov 28, 2023

Does increasing the 1 second timeout to a higher value help in your case?

Will give that a try today and will also confirm we are indeed stalling on EAGAIN.

StephanDollberg on Aug 14, 2023