ec2-fleet-plugin: NoDelayProvisionStrategy incorrect availability

Hi!

We have noticed that our fleets does not scale properly when using the new feature “No delay provisioning”. For some reason it seems to think that we have capacity which we don’t. Right now we have a Jenkins environment with one EC2-Fleet and one master. The master has 0 executors and the ec2-fleet is scaled to 2 with 2 executors per instance. Both executors are occupied and there are 2 jobs in queue and it does not add a new instance.

When looking at the logs it continues to log Available capacity=6, currentDemand=-5 so it seems as if some state is wrong in the plugin regarding the capacity. Is it possible to reset it somehow?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 23

Commits related to this issue

Most upvoted comments

Initial impressions is that the scale-up works a lot better than before. However I set the minimum capacity back to 0 and noticed that my instances are still around after being idle but marked as (suspended).

I checked the AWS console and noticed that the modify spot fleet requests look like this

Modify request received. Requested targetCapacity: 0, excessCapacityTerminationPolicy: NoTermination

The excessCapacityTerminationPolicy: NoTermination part seems to mean that extra servers wont be killed by AWS. The spot fleet has this value set to “default” so I’m guessing that the plugin sends this value incorrectly to AWS.

After this there wont be any scale up activity from the plugin. So the problem remains

Fix released under version 1.16.2

Short problem description:

The previous version of the plugin call modify AWS API first and then reads an updated state immediately. This is ok for Auto Scale Group, which supports sync modification of targetCapacity. However, for EC2 Spot Fleet it doesn’t work very well, since EC2 Spot Fleet after modification call just store request and mark fleet as modifying while targetCapacity remains unchanged. After some time it will be finally applied, however, the plugin did take this time into account and expect a state will be updated which leads to unsynched state and block future provision.

@terma is there any work going on at this issue? Right now I consider this plugin completely broken as it does not do what’s intended. I’ve had to schedule a restart every hour to keep our builds moving forward which is not a long-term-solution