testcontainers-java: Execution gets stuck on ResourceReaper.start on linux host

The jvm gets stuck and has to be manually terminated. This happens intermittently and is hard to reproduce. This issue only happens after upgrading to 1.15.0 (was also seeing issues on 1.15.0-rc2`, 1.14.3 was working fine. Would appreciate any help with figuring out why this happens.

Environment information

$ docker version 
Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Tue Jul 28 02:14:52 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Tue Jul 28 02:14:13 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.3.6
  GitCommit:        be75852b8d7849474a20192f9ed1bf34fdd454f1
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

$ uname -r
4.14.158

$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
ID=debian

Here is the jvm thread trace

Test has timed out!  Here is a trace of what it is currently doing:
2020-11-23 13:41:46
Full thread dump OpenJDK 64-Bit Server VM (25.265-b01 mixed mode):

"Attach Listener" #1216 daemon prio=9 os_prio=0 tid=0x00007fbde8001000 nid=0x563 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
    - None

"testcontainers-ryuk" #1215 daemon prio=5 os_prio=0 tid=0x00007fbd80488000 nid=0xf592 in Object.wait() [0x00007fb8157d6000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:142)
    - locked <0x000000073a8214c8> (a java.util.ArrayList)
    at org.testcontainers.utility.ResourceReaper$$Lambda$589/432240114.run(Unknown Source)
    at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
    at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:133)
    at org.testcontainers.utility.ResourceReaper$$Lambda$588/1491766901.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
    - None

"docker-java-stream-2071761001" #1214 daemon prio=5 os_prio=0 tid=0x00007fbd80377800 nid=0xf568 runnable [0x00007fb8153d2000]
   java.lang.Thread.State: RUNNABLE
    at org.testcontainers.shaded.okio.Buffer.getByte(Buffer.java:312)
    at org.testcontainers.shaded.okio.RealBufferedSource.readHexadecimalUnsignedLong(RealBufferedSource.java:310)
    at org.testcontainers.shaded.okhttp3.internal.http1.Http1ExchangeCodec$ChunkedSource.readChunkSize(Http1ExchangeCodec.java:492)
    at org.testcontainers.shaded.okhttp3.internal.http1.Http1ExchangeCodec$ChunkedSource.read(Http1ExchangeCodec.java:471)
    at org.testcontainers.shaded.okhttp3.internal.connection.Exchange$ResponseBodySource.read(Exchange.java:286)
    at org.testcontainers.shaded.okio.RealBufferedSource$1.read(RealBufferedSource.java:436)
    at org.testcontainers.shaded.com.github.dockerjava.core.FramedInputStreamConsumer.accept(FramedInputStreamConsumer.java:30)
    at org.testcontainers.shaded.com.github.dockerjava.core.FramedInputStreamConsumer.accept(FramedInputStreamConsumer.java:12)
    at org.testcontainers.shaded.com.github.dockerjava.core.DefaultInvocationBuilder.lambda$executeAndStream$1(DefaultInvocationBuilder.java:275)
    at org.testcontainers.shaded.com.github.dockerjava.core.DefaultInvocationBuilder$$Lambda$578/715686037.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
    - None

"OkHttp ConnectionPool" #1206 daemon prio=5 os_prio=0 tid=0x00007fb76003c000 nid=0xf32b in Object.wait() [0x00007fb8155d4000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:460)
    at org.testcontainers.shaded.okhttp3.internal.connection.RealConnectionPool.lambda$new$0(RealConnectionPool.java:62)
    - locked <0x000000070f5aed10> (a org.testcontainers.shaded.okhttp3.internal.connection.RealConnectionPool)
    at org.testcontainers.shaded.okhttp3.internal.connection.RealConnectionPool$$Lambda$550/880380434.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
    - <0x000000073df35e20> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"spatialIndexComponent-258" #1051 daemon prio=5 os_prio=0 tid=0x00007fbd80139000 nid=0xf0cc waiting on condition [0x00007fb815ad9000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x0000000710880420> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
    - None

"spatialIndexComponent-257"    
.
.
.
"spatialIndexComponent-0"


"DelegateRunnerWithTimeout-1" #28 prio=5 os_prio=0 tid=0x00007fbfdd9dd000 nid=0xca86 runnable [0x00007fbda08af000]
   java.lang.Thread.State: RUNNABLE
    at org.testcontainers.shaded.okio.Buffer.indexOf(Buffer.java:1463)
    at org.testcontainers.shaded.okio.RealBufferedSource.indexOf(RealBufferedSource.java:352)
    at org.testcontainers.shaded.okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:230)
    at org.testcontainers.shaded.okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:224)
    at org.testcontainers.shaded.okhttp3.internal.http1.Http1ExchangeCodec$ChunkedSource.readChunkSize(Http1ExchangeCodec.java:489)
    at org.testcontainers.shaded.okhttp3.internal.http1.Http1ExchangeCodec$ChunkedSource.read(Http1ExchangeCodec.java:471)
    at org.testcontainers.shaded.okhttp3.internal.Util.skipAll(Util.java:204)
    at org.testcontainers.shaded.okhttp3.internal.Util.discard(Util.java:186)
    at org.testcontainers.shaded.okhttp3.internal.http1.Http1ExchangeCodec$ChunkedSource.close(Http1ExchangeCodec.java:511)
    at org.testcontainers.shaded.okio.ForwardingSource.close(ForwardingSource.java:43)
    at org.testcontainers.shaded.okhttp3.internal.connection.Exchange$ResponseBodySource.close(Exchange.java:313)
    at org.testcontainers.shaded.okio.RealBufferedSource.close(RealBufferedSource.java:476)
    at org.testcontainers.shaded.okhttp3.internal.Util.closeQuietly(Util.java:139)
    at org.testcontainers.shaded.okhttp3.ResponseBody.close(ResponseBody.java:192)
    at org.testcontainers.shaded.okhttp3.Response.close(Response.java:290)
    at org.testcontainers.shaded.com.github.dockerjava.okhttp.OkDockerHttpClient$OkResponse.close(OkDockerHttpClient.java:280)
    at org.testcontainers.shaded.com.github.dockerjava.core.DefaultInvocationBuilder.lambda$null$0(DefaultInvocationBuilder.java:272)
    at org.testcontainers.shaded.com.github.dockerjava.core.DefaultInvocationBuilder$$Lambda$579/1129446976.close(Unknown Source)
    at com.github.dockerjava.api.async.ResultCallbackTemplate.close(ResultCallbackTemplate.java:77)
    at org.testcontainers.utility.ResourceReaper.start(ResourceReaper.java:177)
    at org.testcontainers.DockerClientFactory.client(DockerClientFactory.java:203)
    - locked <0x0000000710684178> (a [Ljava.lang.Object;)
    at org.testcontainers.LazyDockerClient.getDockerClient(LazyDockerClient.java:14)
    at org.testcontainers.LazyDockerClient.createContainerCmd(LazyDockerClient.java:12)
    at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:355)
    at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:322)
    at org.testcontainers.containers.GenericContainer$$Lambda$534/2098911188.call(Unknown Source)
    at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
    at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:320)
    at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:308)
    at com.uber.fievel.testing.mysql.UberMariaDBContainer.start(UberMariaDBContainer.java:56)
    at org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1021)
    at org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:29)
    at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 17
  • Comments: 17 (3 by maintainers)

Commits related to this issue

Most upvoted comments

This is a race condition in the docker-java library between two threads reading the same okio.RealBufferedSource obtained by a okhttp3.Response. The two threads are thread #1214 (docker-java-stream-2071761001) and Thread #28 (DelegateRunnerWithTimeout-1) in the thread dump of the issue description. Both threads reading simultaneously from the buffered source, which is not thread-safe, screws up the buffer’s internal bookkeeping leading to an infinite loop on both of them.

The issue is triggered in the ResourceReaper with the logCallback initialized here: https://github.com/testcontainers/testcontainers-java/blob/f588082636dc2749f4129e943b02ede62527b41c/core/src/main/java/org/testcontainers/utility/ResourceReaper.java#L114-L124

The .exec invocation internally spawns a thread named docker-java-stream-... in the docker-java library that starts streaming the response of the log container command: https://github.com/docker-java/docker-java/blob/f9d2db6efff4d7fc7092b53209027d6846de1894/docker-java-core/src/main/java/com/github/dockerjava/core/DefaultInvocationBuilder.java#L262-L284

The callback parameter contains the same object as logCallback in the ResourceReaper above. The DockerHttpClient.Response retrieved on line 269 is handed to the sourceConsumer. The sourceConsumer is a FramedInputStreamConsumer (https://github.com/docker-java/docker-java/blob/f9d2db6efff4d7fc7092b53209027d6846de1894/docker-java-core/src/main/java/com/github/dockerjava/core/FramedInputStreamConsumer.java) that was initialized with the same callback. The consumer starts reading the response stream and invokes onNext (ResourceReaper line 121 above) with each new frame read from the stream.

Additionally, the DockerHttpClient.Response retrieved on line 269 is also used in the lambda that is passed into the callback via the onStart method. This lambda will be called when the close method of the callback is invoked (it instantiates the Closeable functional interface):

https://github.com/docker-java/docker-java/blob/f9d2db6efff4d7fc7092b53209027d6846de1894/docker-java-api/src/main/java/com/github/dockerjava/api/async/ResultCallbackTemplate.java#L36-L40 https://github.com/docker-java/docker-java/blob/f9d2db6efff4d7fc7092b53209027d6846de1894/docker-java-api/src/main/java/com/github/dockerjava/api/async/ResultCallbackTemplate.java#L72-83

Now to close the circle, while the response is being streamed by the streaming thread, the callback is eventually closed in the ResourceReaper by the thread that issued the log container command: https://github.com/testcontainers/testcontainers-java/blob/f588082636dc2749f4129e943b02ede62527b41c/core/src/main/java/org/testcontainers/utility/ResourceReaper.java#L205

As seen above, this closes the DockerHttpClient.Response in the lambda which internally reads the buffered source (see stack trace of thread #28). Together with the streaming thread, this makes two threads simultaneously reading from the buffered source.

The issue was most likely introduced by this PR: https://github.com/docker-java/docker-java/pull/1421.

I’m not entirely sure how to fix it though. I guess ideally, the streaming thread should close the response itself and the main thread should only cancel the request, such that the response stream is finished and the streaming thread stops reading.

(embedding code snippets from another repository apparently doesn’t work 😢)

I am using 1.15.1 and also running into this issue. The behavior is basically the same to what @raviagarwal7 describes.

For now, the workaround for us would be to downgrade to the 1.14.3, although that brings its own challenges.

Can’t wait to see a fix coming out.

EDIT: Details of our setup: Running a Jenkins pipeline on a repo containing 70+ modules, many of them using testcontainers. Typical Jenkins’ node setup: CPUs: 4, RAM: 8192MB, Disk: 40GB, SWAP: 0MB, Ephemeral: 0GB. An alternative way to reproduce is to run a single module (the one we tested uses Postgresql test container) many times in a row. With 50 iterations, the incidence of the error raises above 80 %, I would guess.

@jsmrcka could you please try the httpclient5 transport? You can enable it by setting transport.type=httpclient5 in your config (see the available locations here: https://www.testcontainers.org/features/configuration/ )

Updated to 1.15.2. Unfortunately, the issue still happens. Same setup as in my previous comment.