accumulo: OutOfMemoryError During BinMutations cause stuck BatchWriter
Problem The first step in flushing mutations from the Batch Writer to the Tablet Servers involves binning mutations by destination tablet server. If the tablet server throws an OutOfMemoryError or any non-Exception based Throwable, the error will be thrown on the client side. The error will not be caught and will cause the bin thread to die without reporting the fact that an error occurred. This is unrecoverable and leaves the batch writer stuck waiting for mutations get flushed that never will.
Affected versions 1.10.x, main
To Reproduce Not easily reproducible. One could write a rogue iterator that throws an OutOfMemoryError when the metadata table is being scanned.
this:
catch (Exception e) {
should be changed to:
catch (Throwable e) {
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 27 (27 by maintainers)
Commits related to this issue
- Fix stuck batch writer on OutOfMemoryError (#2331) — committed to andrewglowacki/accumulo by deleted user 3 years ago
- Allow UncaughtExceptionHandler to be overridden in client When an ExecutorService's task or Thread in Accumulo encounter an unhandled exception the default UncaughtExceptionHandler will be invoked. A... — committed to dlmarion/accumulo by dlmarion 2 years ago
- Set default UncaughtExceptionHandler in client, enable override (#2554) When an ExecutorService's task or Thread in Accumulo encounter an unhandled exception the default UncaughtExceptionHandler wil... — committed to apache/accumulo by dlmarion 2 years ago
#2554 includes a change in 2.1.0 that provides a default UncaughtExceptionHandler in the client that only logs exceptions / errors, it does not terminate the VM like the AccumuloUncaughtExceptionHandler does on the server side. #2554 also provides the user the ability to supply their own UncaughtExceptionHandler implementation.