ec2-fleet-plugin: Plugin does not scale up when needed

I am currently using version 1.9.2.

I have 2 cases where Scaling up does not go into effect.

Jenkins Plugin does not scale up automatically when I set the minimum cluster to 2 when I am starting out with 0 nodes/instances.

To get around this I had to set the target to 2 in AWS or run enough builds where the Jenkins Plugin scales up to 2 nodes.

Once I have at least 2 nodes running (the minimum for the cluster), if I run enough jobs where the executors for the current nodes fill up and a queue begins to form, there is no scale up.

My current configurations is:

Minimum Cluster Size: 2 Maximum Cluster Size: 20 Number of Executors: 5 Max Idle Minutes Before Scaledown: 2 Connect Using Private IP: true Maximum Init Connection Timeout in sec: 60

Checking AWS Cloudtrail, Jenkins logs, or the Spot Request history there is never any information about scaling nodes up.

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 3
Comments: 25

Most upvoted comments

to all,

Since we have some many questions around plugin doesn't scale up I think it’s time to explain a little bit how does it work, and ask the big question about that.

Intro

The current implementation of the plugin fully depends on Jenkins scale-up strategy. Plugin scales up capacity only by Jenkins request. The plugin can decide to don’t scale-up. if Jenkins requested but cannot scale-up without Jenkins request. The plugin doesn’t have other than max fleet size logic to skip scale-up. And no excessworkload in Jenkins log means no request to plugin from Jenkins!

Jenkins scale-up

One of the main goals for default Jenkins scale-up strategy is avoiding capacity spikes. That’s why you don’t see immediate capacity increasing when you just run tons of jobs. By default Jenkins is trying to postpone capacity scale-up as much as possible, so existent nodes could deal with queue. In other words, Jenkins focus on throughput, but not execution latency. Direct result for this is a small bill.

Possible solution

In the modern world, this approach could be too conservative as usually, we expect a fast result. To do that plugin could override default Jenkins scale-up strategy and do provision almost immediately. The downside will be higher AWS bill versus default strategy.

Question

Do you prefer fast allocation time versus a small bill?

Please put 👍 for this comment if you want to have a choice in plugin configuration to use default Jenkins scale-up or custom fast allocation.

+19

terma on Aug 8, 2019

No delay provision now is available in 1.13.0 just check in configuration No Delay Provision Strategy

It significally improve response time of Jenkins for new load in queue when existent capacity is not enough.

terma on Oct 8, 2019

I started noticing this same issue on multiple jenkins that we have even with different plugin versions.

I’m having the exact same problem with the version 1.1.9

On a different Jenkins I tried to upgrade to the latest versions (both Jenkins and ec2-fleet-plugin) and still find the same issue 🙂

rjtferreira on Jul 23, 2019

Same, here with an entire AWS region at my disposal and 3-4 digit quotas for some instance types, but builds via EC2 Fleet are queuing for a single instance.

battlesnake on Nov 2, 2023

hello from 2023!!! Same issue on latest jenkins and ec2fleet versions, heeeeelp!

Dmitry1987 on Aug 24, 2023

@smastrorocco then you should vote on this comment: https://github.com/jenkinsci/ec2-fleet-plugin/issues/125#issuecomment-519368220

JoseThen on Aug 15, 2019

These are my notes at my second testing: Minimum Cluster Size: 4 Maximum Cluster Size: 10 Number of Executors: 5 Max Idle Minutes Before Scaledown: 2 Connect Using Private IP: true Maximum Init Connection Timeout in sec: 0

Started out with 0 nodes. I set the above configuration and reset Jenkins.
I then started a job to only run on the ec2-fleet label. I began to see excessworkload = 1 in the logs.
Next I spammed the job build and ran enough jobs to need more than 1 node. It took some time probably a minute or 2 but Jenkins began to request excessworkload = 5.
Jenkins now had 4 nodes that took care of all the jobs.
I built around 30 jobs, the nodes began to fill up executors and a queue started to form. No new nodes requested (no excessworkload)
As those jobs were waiting I thought maybe it might just need more, so I triggered another 30 jobs on top of the above step. No new nodes loaded.

Would my issue be related to the way Jenkins load balances jobs? Has there been reports of this plugin : https://wiki.jenkins.io/display/JENKINS/Least+Load+Plugin causing issues?

JoseThen on Jul 27, 2019