amazon-kinesis-producer: addUserRecord call throws DaemonException
Sometimes calling addUserRecord starts to throw:
com.amazonaws.services.kinesis.producer.DaemonException: The child process has been shutdown and can no longer accept messages.
at com.amazonaws.services.kinesis.producer.Daemon.add(Daemon.java:171) ~[amazon-kinesis-producer-0.10.2.jar:na]
at com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:467) ~[amazon-kinesis-producer-0.10.2.jar:na]
at com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:338) ~[amazon-kinesis-producer-0.10.2.jar:na]
The KPL does not seems to recover from this. All further calls to addUserRecord also fail. Restarting the KPL java process fixes the situation.
This seems to happen when the kinesis stream is throttling requests so my guess is that the native process cant write to the stream quickly enough and runs out of memory. If that’s the case my expectation would be that the native process should start to discard older data and of course that if the native process dies the KPL recovers to a working state.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 64 (14 by maintainers)
Thanks for reporting this. We are investigating this, but could use some additional information. It appears that people are still seeing this with the 0.12.x versions of the KPL. How commonly does this occur?
We will investigate adding memory usage tracking for the KPL native process to help determine how much memory it’s consuming.
Can everyone who is affected by this please respond or add a reaction to help us reaction to help us prioritize this issue.
want to point out one observation, not sure how helpful this will be… I tried (on windows) with DATA_SIZE = 10; SECONDS_TO_RUN = 1; RECORDS_PER_SECOND = 1;
and following is log (which point to file/pipe \.\pipe\amz-aws-kpl-in-pipe- not found )
Unfortunately the KPL is built against glibc while Alpine Linux uses musl libc. This causes the native component to fail runtime linking, and crash. There appears to be some Docker images that include glibc, but I can’t vouch for whether they would work or not.
We plan to use this library for high workloads but it looks like it doesn’t prevent the native process to crash in high workloads. When the process is dead, our application stops working, it would be great if you could focus on this issue.
Any update on this? I’ve just ran into similar issues…
Is the official recommendation to move out of KPL and adopt the SDK directly?
Worked fine in local. But getting heap size issues running in docker container linux. How much heap this needs?
When can this KPL works with Windows? I am OK with version 0.12.* does not work with Windows. However, the version 0.10.* also does not work, since I always got this Exception: ERROR KinesisProducer:148 - Error in child process com.amazonaws.services.kinesis.producer.IrrecoverableError: Unexpected error connecting to child process at com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:502) at com.amazonaws.services.kinesis.producer.Daemon.access$1200(Daemon.java:61) at com.amazonaws.services.kinesis.producer.Daemon$6.run(Daemon.java:447) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.file.NoSuchFileException: \.\pipe\amz-aws-kpl-in-pipe-6cd6f933 at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:79) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(WindowsFileSystemProvider.java:115) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:335) at com.amazonaws.services.kinesis.producer.Daemon.connectToChild(Daemon.java:329) at com.amazonaws.services.kinesis.producer.Daemon.access$1000(Daemon.java:61) at com.amazonaws.services.kinesis.producer.Daemon$6.run(Daemon.java:444) … 5 more
And it is not only me, this guy also got same problem:
https://stackoverflow.com/questions/43113791/getting-error-while-running-amazon-kinesis-producer-sample
Looking forward to your update and thank you so much!
Sincerely
Hello,
Thank you everyone for sharing your experience and learning with the community. For an example of how to implement this, see the KPL sample application in this repository, specifically this line. (this is for a test application, and in this case, it just shuts down the sample application after displaying the underlying failure)
This is a general failure condition that occurs when there is any unresolvable configuration problem with the KPL. Usually when this happens it is for one for the following reasons:
If you are experiencing this problem and can confirm that it is not due to a configuration/access, please re-open the issue and provide more details on your configuration and if any reproduction steps are consistently successful, including steps about stream creation, iam users/roles/permissions, container/ec2 instance, etc.
For additional assistance, you may also open a customer support ticket through the AWS console to receive more specific support.
I’ll see if I can make sure the KPL log more at start up then it does today. It might help catching these cases where the KPL is missing some things it needs at startup.
From what we have observed, it’s not a bug, but by-design behaviour.
The thing is, message to kinesis is being stored (put operation) on blocking queue, which has
InterruptedExceptionas checked exception in it’s signature.This operation is being made on the calling thread in the KPL code, so, if there is a possibility for the calling thread to be interrupted, catch logic would be invoked:
Actually,
fatalErrorcall will terminate the daemon:So, to workaround this issue make sure you are not invoking
addUserRecordon the thread which might be interrupted, e.g. any of your server request handling thread.This piece of code did the thing for us, make sure something similar is happening in your code:
@sshrivastava-incontact what you’re seeing is related to running the KPL 0.12.x on Windows which isn’t currently supported.
For those running on Linux, and Mac OS X: The newest version of the KPL includes some additional logging about how busy the sending process is. See the Release Notes for 0.12.4 on the meaning of the log messages. Under certain circumstances the native component can actually run itself out of threads, which will trigger a failure of the native process.
On a few occasions we ran into the crashed KPL, we also observed our JVM cannot create more threads, even though the thread count in JVM is very stable.
Exception in thread “qtp213045289-24084” java.lang.OutOfMemoryError: unable to create new native thread
I wonder whether that’s because the KPL process has somehow prevented its parent JVM process to get more native threads.
Well we moved away from AWS KPL and are using AWS SDK for Java now to stream data to Kinesis. It didn’t work out for us in the end as we saw those errors quite frequently.