amazon-kinesis-producer: addUserRecord call throws DaemonException

Sometimes calling addUserRecord starts to throw:

com.amazonaws.services.kinesis.producer.DaemonException: The child process has been shutdown and can no longer accept messages.
    at com.amazonaws.services.kinesis.producer.Daemon.add(Daemon.java:171) ~[amazon-kinesis-producer-0.10.2.jar:na]
    at com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:467) ~[amazon-kinesis-producer-0.10.2.jar:na]
    at com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:338) ~[amazon-kinesis-producer-0.10.2.jar:na]

The KPL does not seems to recover from this. All further calls to addUserRecord also fail. Restarting the KPL java process fixes the situation.

This seems to happen when the kinesis stream is throttling requests so my guess is that the native process cant write to the stream quickly enough and runs out of memory. If that’s the case my expectation would be that the native process should start to discard older data and of course that if the native process dies the KPL recovers to a working state.

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 64 (14 by maintainers)

Most upvoted comments

Thanks for reporting this. We are investigating this, but could use some additional information. It appears that people are still seeing this with the 0.12.x versions of the KPL. How commonly does this occur?

We will investigate adding memory usage tracking for the KPL native process to help determine how much memory it’s consuming.

Can everyone who is affected by this please respond or add a reaction to help us reaction to help us prioritize this issue.

+42

pfifer on Feb 15, 2017

want to point out one observation, not sure how helpful this will be… I tried (on windows) with DATA_SIZE = 10; SECONDS_TO_RUN = 1; RECORDS_PER_SECOND = 1;

and following is log (which point to file/pipe \.\pipe\amz-aws-kpl-in-pipe- not found )

[main] INFO com.amazonaws.services.kinesis.producer.KinesisProducer - Extracting binaries to C:\Users\xxxx~1\AppData\Local\Temp\amazon-kinesis-producer-native-binaries
[main] INFO com.amazonaws.samples.SampleProducer - Starting puts... will run for 1 seconds at 1 records per second
[pool-1-thread-1] INFO com.amazonaws.samples.SampleProducer - Put 1 of 1 so far (100.00 %), 0 have completed (0.00 %)
[main] INFO com.amazonaws.samples.SampleProducer - Waiting for remaining puts to finish...
Exception in thread "main" com.amazonaws.services.kinesis.producer.DaemonException: The child process has been shutdown and can no longer accept messages.
	at com.amazonaws.services.kinesis.producer.Daemon.add(Daemon.java:171)
	at com.amazonaws.services.kinesis.producer.KinesisProducer.flush(KinesisProducer.java:708)
	at com.amazonaws.services.kinesis.producer.KinesisProducer.flush(KinesisProducer.java:727)
	at com.amazonaws.services.kinesis.producer.KinesisProducer.flushSync(KinesisProducer.java:752)
	at com.amazonaws.samples.SampleProducer.main(SampleProducer.java:262)
[pool-3-thread-2] ERROR com.amazonaws.services.kinesis.producer.KinesisProducer - Error in child process
com.amazonaws.services.kinesis.producer.IrrecoverableError: Unexpected error connecting to child process
	at com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:502)
	at com.amazonaws.services.kinesis.producer.Daemon.access$1200(Daemon.java:61)
	at com.amazonaws.services.kinesis.producer.Daemon$6.run(Daemon.java:447)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.nio.file.NoSuchFileException: \\.\pipe\amz-aws-kpl-in-pipe-878f15f7
	at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
	at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
	at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
	at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown Source)
	at java.nio.channels.FileChannel.open(Unknown Source)
	at java.nio.channels.FileChannel.open(Unknown Source)
	at com.amazonaws.services.kinesis.producer.Daemon.connectToChild(Daemon.java:329)
	at com.amazonaws.services.kinesis.producer.Daemon.access$1000(Daemon.java:61)
	at com.amazonaws.services.kinesis.producer.Daemon$6.run(Daemon.java:444)
	... 5 more
[kpl-callback-pool-0-thread-0] ERROR com.amazonaws.samples.SampleProducer - Exception during put
com.amazonaws.services.kinesis.producer.IrrecoverableError: Unexpected error connecting to child process
	at com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:502)
	at com.amazonaws.services.kinesis.producer.Daemon.access$1200(Daemon.java:61)
	at com.amazonaws.services.kinesis.producer.Daemon$6.run(Daemon.java:447)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.nio.file.NoSuchFileException: \\.\pipe\amz-aws-kpl-in-pipe-878f15f7
	at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
	at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
	at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
	at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown Source)
	at java.nio.channels.FileChannel.open(Unknown Source)
	at java.nio.channels.FileChannel.open(Unknown Source)
	at com.amazonaws.services.kinesis.producer.Daemon.connectToChild(Daemon.java:329)
	at com.amazonaws.services.kinesis.producer.Daemon.access$1000(Daemon.java:61)
	at com.amazonaws.services.kinesis.producer.Daemon$6.run(Daemon.java:444)
	... 5 more

+11

sshrivastava-incontact on Jun 2, 2017

Unfortunately the KPL is built against glibc while Alpine Linux uses musl libc. This causes the native component to fail runtime linking, and crash. There appears to be some Docker images that include glibc, but I can’t vouch for whether they would work or not.

pfifer on Apr 9, 2018

We plan to use this library for high workloads but it looks like it doesn’t prevent the native process to crash in high workloads. When the process is dead, our application stops working, it would be great if you could focus on this issue.

buremba on Jul 8, 2017

Any update on this? I’ve just ran into similar issues…

Is the official recommendation to move out of KPL and adopt the SDK directly?

wikier on Sep 23, 2019

Worked fine in local. But getting heap size issues running in docker container linux. How much heap this needs?

vijaychd on Sep 4, 2018

When can this KPL works with Windows? I am OK with version 0.12.* does not work with Windows. However, the version 0.10.* also does not work, since I always got this Exception: ERROR KinesisProducer:148 - Error in child process com.amazonaws.services.kinesis.producer.IrrecoverableError: Unexpected error connecting to child process at com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:502) at com.amazonaws.services.kinesis.producer.Daemon.access$1200(Daemon.java:61) at com.amazonaws.services.kinesis.producer.Daemon$6.run(Daemon.java:447) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.file.NoSuchFileException: \.\pipe\amz-aws-kpl-in-pipe-6cd6f933 at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:79) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(WindowsFileSystemProvider.java:115) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:335) at com.amazonaws.services.kinesis.producer.Daemon.connectToChild(Daemon.java:329) at com.amazonaws.services.kinesis.producer.Daemon.access$1000(Daemon.java:61) at com.amazonaws.services.kinesis.producer.Daemon$6.run(Daemon.java:444) … 5 more

And it is not only me, this guy also got same problem:
https://stackoverflow.com/questions/43113791/getting-error-while-running-amazon-kinesis-producer-sample

Looking forward to your update and thank you so much!

Sincerely

ZijianQi on Jul 26, 2017

Hello,

Thank you everyone for sharing your experience and learning with the community. For an example of how to implement this, see the KPL sample application in this repository, specifically this line. (this is for a test application, and in this case, it just shuts down the sample application after displaying the underlying failure)

This is a general failure condition that occurs when there is any unresolvable configuration problem with the KPL. Usually when this happens it is for one for the following reasons:

Credentials could not be found
Lack of permissions for resources
Stream does not exist
Targeting wrong region (leading to Stream does not exist)

If you are experiencing this problem and can confirm that it is not due to a configuration/access, please re-open the issue and provide more details on your configuration and if any reproduction steps are consistently successful, including steps about stream creation, iam users/roles/permissions, container/ec2 instance, etc.

For additional assistance, you may also open a customer support ticket through the AWS console to receive more specific support.

Cory-Bradshaw on Aug 7, 2019

I’ll see if I can make sure the KPL log more at start up then it does today. It might help catching these cases where the KPL is missing some things it needs at startup.

pfifer on Mar 20, 2018

From what we have observed, it’s not a bug, but by-design behaviour.

The thing is, message to kinesis is being stored (put operation) on blocking queue, which has InterruptedException as checked exception in it’s signature.

This operation is being made on the calling thread in the KPL code, so, if there is a possibility for the calling thread to be interrupted, catch logic would be invoked:

/**
 * Enqueue a message to be sent to the child process.
 * 
 * @param m
 */
public void add(Message m) {
    if (shutdown.get()) {
        throw new DaemonException(
                "The child process has been shutdown and can no longer accept messages.");
    }
    
    try {
        outgoingMessages.put(m); //<-- HERE 
    } catch (InterruptedException e) {
        fatalError("Unexpected error", e);
    }
}

Actually, fatalError call will terminate the daemon:

private synchronized void fatalError(String message, Throwable t, boolean retryable) {
    if (!shutdown.getAndSet(true)) {
        if (process != null) {
            process.destroy(); //<-- HERE PROCESS IS DESTROYED
        }
        try {
            executor.awaitTermination(1, TimeUnit.SECONDS);
        } catch (InterruptedException e) { }
        executor.shutdownNow();
        // other code
    }
}

So, to workaround this issue make sure you are not invoking addUserRecord on the thread which might be interrupted, e.g. any of your server request handling thread.

This piece of code did the thing for us, make sure something similar is happening in your code:

private void sendToKinesis(ByteBuffer buffer) {
    CompletableFuture.runAsync(() -> {
        try {
            streamProducer.addUserRecord(...)
        } catch (Throwable e) {
            log.error("Failed to send to Kinesis: {}", e.getMessage());
        }
    }); // run on separate thread pool
}

bk-mz on Nov 12, 2017

@sshrivastava-incontact what you’re seeing is related to running the KPL 0.12.x on Windows which isn’t currently supported.

For those running on Linux, and Mac OS X: The newest version of the KPL includes some additional logging about how busy the sending process is. See the Release Notes for 0.12.4 on the meaning of the log messages. Under certain circumstances the native component can actually run itself out of threads, which will trigger a failure of the native process.

pfifer on Jul 10, 2017

On a few occasions we ran into the crashed KPL, we also observed our JVM cannot create more threads, even though the thread count in JVM is very stable.

Exception in thread “qtp213045289-24084” java.lang.OutOfMemoryError: unable to create new native thread

I wonder whether that’s because the KPL process has somehow prevented its parent JVM process to get more native threads.

xujiaxj on Mar 15, 2017

Well we moved away from AWS KPL and are using AWS SDK for Java now to stream data to Kinesis. It didn’t work out for us in the end as we saw those errors quite frequently.

quiqua on Mar 2, 2017