docker-plugin: losing connection with java.io.IOException: Unexpected termination of the channel after a few minutes

Setup

  • Docker-plugin: version: 1.1.5
  • Jenkins version: 2.121.3 (LTS) and 2.136 weekly
  • Docker engine version: 18.06.0-ce, API Version = 1.38

We are using a custom image based on the latest jenkins/ssh-slave. Agents are started with attach connect method and run with a freestyle job.

Symptom

We are using the agents to do automatic testing with Robot Framework. After updating the Docker plugin to version 1.1.5 our agents fail every time after a few minutes with the console log:

java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
	at hudson.remoting.Command.readFrom(Command.java:140)
	at hudson.remoting.Command.readFrom(Command.java:126)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: java.io.IOException: Backing channel 'docker-002g9n6u4hpkj' is disconnected.
	at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:214)
	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:283)
	at com.sun.proxy.$Proxy222.isAlive(Unknown Source)
	at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1143)
	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1135)
	at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
	at hudson.model.Build$BuildExecution.build(Build.java:206)
	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
	at hudson.model.Run.execute(Run.java:1815)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
	at hudson.model.ResourceController.execute(ResourceController.java:97)
	at hudson.model.Executor.run(Executor.java:429)
FATAL: Unable to delete script file /tmp/jenkins5950680125503140381.sh
java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
	at hudson.remoting.Command.readFrom(Command.java:140)
	at hudson.remoting.Command.readFrom(Command.java:126)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on docker-002g9n6u4hpkj failed. The channel is closing down or has closed down
	at hudson.remoting.Channel.call(Channel.java:948)
	at hudson.FilePath.act(FilePath.java:1070)
	at hudson.FilePath.act(FilePath.java:1059)
	at hudson.FilePath.delete(FilePath.java:1563)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:123)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
	at hudson.model.Build$BuildExecution.build(Build.java:206)
	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
	at hudson.model.Run.execute(Run.java:1815)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
	at hudson.model.ResourceController.execute(ResourceController.java:97)
	at hudson.model.Executor.run(Executor.java:429)

And the Jenkins log shows:

I/O error in channel docker-002h28zxulg3r
java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
	at hudson.remoting.Command.readFrom(Command.java:140)
	at hudson.remoting.Command.readFrom(Command.java:126)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)

INFO io.jenkins.docker.DockerTransientNode$1 println
Disconnected computer for node 'docker-002h28zxulg3r'.
INFO io.jenkins.docker.DockerTransientNode$1 println
Removed Node for node 'docker-002h28zxulg3r'.
SEVERE com.github.dockerjava.core.async.ResultCallbackTemplate onError
Error during callback
com.github.dockerjava.api.exception.NotModifiedException: 
	at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:97)
	at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:33)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
	at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:997)
	at io.netty.channel.epoll.EpollDomainSocketChannel$EpollDomainUnsafe.epollInReady(EpollDomainSocketChannel.java:138)
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:401)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:306)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	at java.lang.Thread.run(Thread.java:748)

If the plugin version is downgraded to 1.1.4 everything works as it used to and no connection loss appear. Tried also to set the system parameter com.nirima.jenkins.plugins.docker.DockerContainerWatchdog.enabled=false but it had no effect on this issue.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 3
  • Comments: 28 (16 by maintainers)

Most upvoted comments

OK, so to summarize for anyone else who stumbles upon this issue in search of a solution…

This was caused by having another Jenkins server with the same instance ID that was using the same docker host. That was the magic combination.

  • If you’ve got multiple Jenkins servers using the same docker host, you MUST ensure that they have different instance IDs, otherwise the background cleanup DockerContainerWatchdog will kill the other Jenkins server’s containers.
  • If you can’t give them different instance IDs then you’ll need to set system parameter com.nirima.jenkins.plugins.docker.DockerContainerWatchdog.enabled=false on ALL the Jenkins servers sharing that ID that share the same docker hosts.

I can only assume that, somehow, your two Jenkins servers have the same ID … or your docker daemon is ignoring the filter it’s given.

You are correct. I figured that out later. When I created our development setup I copied the production data including the instance ID key. I regenerated that now following the advice at https://support.cloudbees.com/hc/en-us/articles/215578778-How-to-fix-an-instance-having-an-invalid-identity-key-enc-file-. I’ll test out whether this works correctly now. Sorry for all the noise!