kubernetes-client: io.fabric8.kubernetes.client.KubernetesClientException: not ready after 5000 MILLISECONDS
Describe the bug
This is happening in my 4 Jenkins servers after upgrading Jenkins and AKS.
`Af of sudden all Jenkins agent pods started giving Errors as below, Few pods are working and few are giving Errors. This is Happening 1-2 times out of 4-5 attempts.
AKS Version : 1.20.13
Jenkins Version, which clusters is having different version. I can reproduce this Error in all versions.
AKS-1: kubernetes:1.30.1 kubernetes-client-api:5.10.1-171.vaa0774fb8c20 kubernetes-credentials:0.8.0
AKS-2: kubernetes:1.31.3 kubernetes-client-api:5.11.2-182.v0f1cf4c5904e kubernetes-credentials:0.9.0
AKS-3: kubernetes:1.30.1 kubernetes-client-api:5.10.1-171.vaa0774fb8c20 kubernetes-credentials:0.8.0
AKS-4: kubernetes:1.31.3 workflow-job:1145.v7f2433caa07f workflow-aggregator:2.6`
Troubleshooting steps i did.
- revert Jenkins to old version => results in same Error
- upgrade Jenkins to all new Version including plugins in use => Results in same Error.
- Downgraded Jenkins K8s and K8s API plugins to stable version as per some suggestion in github. => same Error
- Created Brand new cluster and install Jenkins and Job pod starting giving same Error. => same Error
Not sure how to resolve this Error. Please let me know if i’m missing anything.
Fabric8 Kubernetes Client version
5.10.1@latest
Steps to reproduce
Create/run job will give this Error.
Expected behavior
Pod should get successfully executed .
Runtime
Kubernetes (vanilla)
Kubernetes API Server version
1.21.6
Environment
Linux
Fabric8 Kubernetes Client Logs
18:23:33 [WS-CLEANUP] Deleting project workspace...
18:23:33 [WS-CLEANUP] Deferred wipeout is used...
18:23:33 [WS-CLEANUP] done
18:23:33 [Pipeline] }
18:23:33 [Pipeline] // container
18:23:33 [Pipeline] }
18:23:33 [Pipeline] // node
18:23:33 [Pipeline] }
18:23:33 [Pipeline] // timeout
18:23:33 [Pipeline] }
18:23:33 [Pipeline] // podTemplate
18:23:33 [Pipeline] End of Pipeline
18:23:33 io.fabric8.kubernetes.client.KubernetesClientException: not ready after 5000 MILLISECONDS
18:23:33 at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:176)
18:23:33 at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.exec(PodOperationsImpl.java:322)
18:23:33 at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.exec(PodOperationsImpl.java:84)
18:23:33 at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.doLaunch(ContainerExecDecorator.java:413)
18:23:33 at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:330)
18:23:33 at hudson.Launcher$ProcStarter.start(Launcher.java:507)
18:23:33 at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:176)
18:23:33 at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:132)
18:23:33 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:324)
18:23:33 at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:319)
18:23:33 at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:193)
18:23:33 at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)
18:23:33 at jdk.internal.reflect.GeneratedMethodAccessor6588.invoke(Unknown Source)
18:23:33 at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
18:23:33 at java.base/java.lang.reflect.Method.invoke(Method.java:566)
18:23:33 at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
18:23:33 at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
18:23:33 at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213)
18:23:33 at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
18:23:33 at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
18:23:33 at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
18:23:33 at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
18:23:33 at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:163)
18:23:33 at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
18:23:33 at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:158)
18:23:33 at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:161)
18:23:33 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:165)
18:23:33 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)
18:23:33 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)
18:23:33 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)
18:23:33 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)
18:23:33 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)
18:23:33 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)
18:23:33 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)
18:23:33 at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
18:23:33 at WorkflowScript.run(WorkflowScript:114)
18:23:33 at ___cps.transform___(Native Method)
18:23:33 at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:86)
18:23:33 at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)
18:23:33 at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)
18:23:33 at jdk.internal.reflect.GeneratedMethodAccessor210.invoke(Unknown Source)
18:23:33 at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
18:23:33 at java.base/java.lang.reflect.Method.invoke(Method.java:566)
18:23:33 at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
18:23:33 at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
18:23:33 at com.cloudbees.groovy.cps.Next.step(Next.java:83)
18:23:33 at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
18:23:33 at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
18:23:33 at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)
18:23:33 at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)
18:23:33 at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
18:23:33 at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
18:23:33 at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
18:23:33 at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:185)
18:23:33 at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:402)
18:23:33 at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:96)
18:23:33 at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:314)
18:23:33 at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:278)
18:23:33 at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
18:23:33 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
18:23:33 at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
18:23:33 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
18:23:33 at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
18:23:33 at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
18:23:33 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
18:23:33 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
18:23:33 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
18:23:33 at java.base/java.lang.Thread.run(Thread.java:829)
18:23:33 [Bitbucket] Notifying commit build result
18:23:33 [Bitbucket] Build result notified
18:23:33 Finished: FAILURE
Additional context
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 3
- Comments: 27 (8 by maintainers)
Same here, I think this one can be closed 😃
I asked around and I here that we haven’t seen this issue lately.
Update to latest Kubernetes plugin, which introduces retry on the timeouts.
Does this issue still happen on the latest version?
We started receiving this error a lot after bumping Jenkins server and Kubernetes plugin:
Jenkins: 2.320 -> 2.337 Plugin: kubernetes:1.30.7 -> kubernetes:3546.v6103d89542d6
Stacktrace:
This may be related to #4319 and #4355 - if the exec is against a container with multiple pods and no container name is supplied the exception handling is very unclear for okhttp.
There have been some additional changes in 6, have you tried that as well? And possibly with an alternative client - JDK or Jetty? That would help determine if it’s an okhttp specific problem.
It’s been a while since I looked at this. In my local test had no stdout / stderr handling and seemed to produce a similar exception - however at least on 5.12/6.0 you can clearly see that’s not due to the timeout, but a bad request. In the exception chain:
… Caused by: java.net.ProtocolException: Expected HTTP 101 response but was ‘400 Bad Request’
It seems that the api server wants to see that you are handling at least one of the streams to accept the request - which is not the case here based upon the Jenkins code.
If you are still getting a timeout in 5.12, then I still am not quite sure what is going on. It sounds like something highly related was already happening on the 5.4 kubernetes client - that would suggest to me that it’s more due to an environmental issue.
I have been trying to troubleshoot this for some time. There are a few things to note:
Interrupted while starting websocket connection, you should increase the Max connections to Kubernetes API
now showsio.fabric8.kubernetes.client.KubernetesClientException: not ready after <> MILLISECONDS
. As the plugin is catching a KubernetesClientException.When collecting data in impacted environments, the problem seems to happen even if a single connection is being made from the client. So not necessarily related to the Okhttp thread pool managing concurrent connections for example.
@shawkins your comment about the stderr handling is interesting. I am curious about why you thought about this ? It is interesting as the Jenkins kubernetes plugin does indeed handles stdout / stderr with:
Do you think this has an impact on the socket connection response ? If yes, I am wondering if 6.0.0 refactors such as https://github.com/fabric8io/kubernetes-client/pull/4115 and the
onExit()
might help here ?Similar issues being found on a k8s helm deployment of jenkins for my company also.
We’ve hit a stack trace like below:
This started happening out of the blue after deploying a firewall rule to block some k8s ports, but none of which are pertinent (as far as I am aware) to jenkins running.
Any advice appreciated.
hey @sunix I’m following this issue https://issues.jenkins.io/browse/JENKINS-67664
hey, @rg970197 were you able to fix it? we already tried everything you mentioned too without success.
Update: We were able to mitigate the issue with the following configuration:
Jenkins Server is in 2.338 but I think this is pointless