ec2-fleet-plugin: Planned capacity snapshot bug

During the setup of this plugin I ran into the following issue: no instances would be spawned because it thinks it has enough capacity in the form of so called ‘planned capacity snapshots’. I’m not quite sure what this is, I can’t find any clear documentation on it. The logs give me the following statement:

currentDemand -1 availableCapacity 2 (availableExecutors 0 connectingExecutors 0 plannedCapacitySnapshot 2 additionalPlannedCapacity 0)

Current demand is -1, which is correct since one build is queued up and awaiting execution. But no instance will spawn so it never executes.

Some additional information: I’ve had 2 on-demand instances before running alongside the spotfleet to allow for builds to not break during the installation of the spotfleet. Perhaps this is related to the planned capacity snapshot capacity? I’m quite sure if this is relevant though.

Am I missing something here? Is there documentation on the different capacity types?

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 1
Comments: 26

Most upvoted comments

Fixes released in 1.16.2

Fix provision problems
Avoid negative target capacity
Correctly show fleet info widget (for multiple fleets)

Will keep it open, please post feedback if possible, thx

terma on Dec 17, 2019

ok, I think I fixed both problems, widget and actual provision in https://github.com/jenkinsci/ec2-fleet-plugin/pull/159 will update when it will be merged

terma on Dec 16, 2019

Hi, we recently updated the plugin from version 1.11.1 to version 1.16.1.

Since this update we encounter lot of problems, it works during several hours and then when no jobs are running the Fleets scale-down to 0 (all our EC2 Fleets are configure with Min Cluster Size equals to 0).

A this time new jobs began to be queued and all our Fleets stays at 0 indefinitely.

Yesterday I restarted my Jenkins slave, after restart the jobs which were queued restarted correctly. This morning our job queue was clean and everything was good.

But, today at 12 am we started to see a big job queue again and all our Spot Fleets stays at 0. The Spot Fleets do not scale-up anymore.

I’m not 100% sure but it seems several persons reported similar problems :

In #153 the last comment indicated that a reboot was required to make it work again ;
In #149 a similar problem is reported, in one comment a user mentioned a restart solved the problem.

On our side we tried the following without success.

Changing the No Delay Provision Strategy, checked or unchecked the effect is the same Spot Fleet stays intact
Changing Minimum Cluster Size from 0 to 1 has no effect (and nothing appears in the Spot Fleets history on AWS side)

The only thing which can make the Spot Fleet scale-up again seems to be a Jenkins restart. But after few hours the problems is their again (it seems to appear after our Spot Fleet scale down to 0).

Today I finally succeed to see strange exceptions in our logs.

Error during fleet micro-1.1.0 stats update
com.amazonaws.services.ec2.model.AmazonEC2Exception: targetCapacity must be an integer value greater than or equal to 0. (Service: AmazonEC2; Status Code: 400; Error Code: InvalidParameterValue; Request ID: d7cf11fa-63f3-4988-98d8-b0f4ddfa82dc)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
	at com.amazonaws.services.ec2.AmazonEC2Client.doInvoke(AmazonEC2Client.java:22500)
	at com.amazonaws.services.ec2.AmazonEC2Client.invoke(AmazonEC2Client.java:22467)
	at com.amazonaws.services.ec2.AmazonEC2Client.invoke(AmazonEC2Client.java:22456)
	at com.amazonaws.services.ec2.AmazonEC2Client.executeModifySpotFleetRequest(AmazonEC2Client.java:18360)
	at com.amazonaws.services.ec2.AmazonEC2Client.modifySpotFleetRequest(AmazonEC2Client.java:18331)
	at com.amazon.jenkins.ec2fleet.fleet.EC2SpotFleet.modify(EC2SpotFleet.java:79)
	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.update(EC2FleetCloud.java:409)
	at com.amazon.jenkins.ec2fleet.CloudNanny.doRun(CloudNanny.java:62)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Here is the configuration associated to the EC2 Fleet micro-1.1.0.

The other strange thing I see is that the EC2 Fleet Status panel cannot be opened.

Don’t know if its related but perhaps is also cause by a property which cannot be read somewhere.

Finally, just after a Jenkins restart I can also see the following exception in my logs.

Failed to load org.jenkinsci.plugins.github.pullrequest.extra.GitHubPRLabelUnblockQueueCondition$DescriptorImpl
java.lang.ClassNotFoundException: org.jenkinsci.plugins.blockqueuedjob.condition.BlockQueueCondition$BlockQueueConditionDescriptor
	at jenkins.util.AntClassLoader.findClassInComponents(AntClassLoader.java:1383)
	at jenkins.util.AntClassLoader.findClass(AntClassLoader.java:1336)
	at jenkins.util.AntClassLoader.loadClass(AntClassLoader.java:1083)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
Caused: java.lang.NoClassDefFoundError: org/jenkinsci/plugins/blockqueuedjob/condition/BlockQueueCondition$BlockQueueConditionDescriptor
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
	at jenkins.util.AntClassLoader.defineClassFromData(AntClassLoader.java:1149)
	at hudson.ClassicPluginStrategy$AntClassLoader2.defineClassFromData(ClassicPluginStrategy.java:712)
	at jenkins.util.AntClassLoader.getClassFromStream(AntClassLoader.java:1320)
	at jenkins.util.AntClassLoader.findClassInComponents(AntClassLoader.java:1373)
	at jenkins.util.AntClassLoader.findClass(AntClassLoader.java:1336)
	at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at jenkins.ClassLoaderReflectionToolkit.invoke(ClassLoaderReflectionToolkit.java:44)
	at jenkins.ClassLoaderReflectionToolkit._findClass(ClassLoaderReflectionToolkit.java:81)
	at hudson.PluginManager$UberClassLoader.findClass(PluginManager.java:2041)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.jvnet.hudson.annotation_indexer.Index$2$1.fetch(Index.java:99)
	at org.jvnet.hudson.annotation_indexer.Index$2$1.hasNext(Index.java:73)
	at org.jvnet.hudson.annotation_indexer.SubtypeIterator.fetch(SubtypeIterator.java:18)
	at org.jvnet.hudson.annotation_indexer.SubtypeIterator.hasNext(SubtypeIterator.java:28)
	at org.jenkinsci.plugins.structs.SymbolLookup.findDescriptor(SymbolLookup.java:140)
	at org.jenkinsci.plugins.pipeline.modeldefinition.agent.DeclarativeAgentDescriptor.byName(DeclarativeAgentDescriptor.java:121)
	at org.jenkinsci.plugins.pipeline.modeldefinition.agent.DeclarativeAgentDescriptor.instanceForName(DeclarativeAgentDescriptor.java:134)
	at org.jenkinsci.plugins.pipeline.modeldefinition.agent.DeclarativeAgentDescriptor$instanceForName$3.call(Unknown Source)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
	at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20)
	at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:86)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)
	at sun.reflect.GeneratedMethodAccessor164.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.dispatch(CollectionLiteralBlock.java:55)
	at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.item(CollectionLiteralBlock.java:45)
	at sun.reflect.GeneratedMethodAccessor172.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.LocalVariableBlock$LocalVariable.get(LocalVariableBlock.java:39)
	at com.cloudbees.groovy.cps.LValueBlock$GetAdapter.receive(LValueBlock.java:30)
	at com.cloudbees.groovy.cps.impl.LocalVariableBlock.evalLValue(LocalVariableBlock.java:28)
	at com.cloudbees.groovy.cps.LValueBlock$BlockImpl.eval(LValueBlock.java:55)
	at com.cloudbees.groovy.cps.LValueBlock.eval(LValueBlock.java:16)
	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)
	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)
	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:186)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:370)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:93)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:282)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:270)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

This exception is related to the Github plugin but I found it mentioned on the EC2 Plugin too so i’m wondering if it could be linked : https://issues.jenkins-ci.org/browse/JENKINS-54041

Hope their is enough context to track the problem, we absolutely need a fix for this because it impacts our whole team.

Thanks for your help.

bgaillard on Nov 26, 2019

Waiting for @bgaillard to confirm this before I close the issue.

Hi, after one day it seems to work, at least the scale-up is not locked as before. The Widget bug is also solved.

bgaillard on Dec 18, 2019

This is pretty much the same issue as in #149 . There is something off with the capacity calculation. Our solution for now is to add a jenkins restart every hour in crontab

FredrikSeidl on Dec 9, 2019