ec2-fleet-plugin: New nodes get I/O error and disconnect at ssh timeout
When the ec2-fleet-plugin adds new spotfleet instances as Jenkins nodes using the Launcher selection of “Launch agent agents via SSH”, the nodes all connect just fine, but some percentage of them disconnect with a SEVERE I/O error shortly after launch. The I/O error happens at exactly the configured “Connection Timeout in Seconds” from launch. When reconnected after that, they have no problems.
I have not confirmed, but I think this only happens when “Max Idle Minutes Before Scaledown” is set (attaching the IdleRetentionStrategy to the node).
Here’s the sequence from the logs with the default ssh connection timeout of 210 seconds:
15:01:11.832 - INFO: Found new instances from fleet (ec2-fleet test): [<snip>, i-08db665d464785aec, <snip>]
15:01:21.966 - INFO: Idle Retention initiated
15:01:21.967 - INFO: Attempting to reconnect i-08db665d464785aec
15:01:56.067 - SSH Launch of i-08db665d464785aec on 10.21.131.211 completed in 34,083 ms
15:04:51.990 - SEVERE: I/O error in channel i-08db665d464785aec
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 49
So, I create the following PR which seems to fix the issue: https://github.com/jenkinsci/trilead-ssh2/pull/36
I don’t get any disconnection after that, with the following log:
Since it is a problem with the trilead timeoutHandler, I’m going to try to work around this issue by increasing the “Connection Timeout in Seconds” to a massive number and see if that stabilizes my Jenkins environment. The drawback to this approach is that the connection attempt could hang indefinitely. That seems preferable to a rogue asynchronous process killing my builds by cutting the ssh connection.
@LoveDuckie Can you please open a new issue along with logs and provide info around your setup? I’m closing off this issue as it is pretty old and having latest info per newer releases would be helpful
Indeed upping the timeout to an insane amount and I adding “docker info &&” to Prefix command works like a charm. Thanks for the work around!
🤦♂️ nevermind, I get it now, it’s in the cloud configuration section of the plugin configuration
I found another reference to this issue (see the latest comments by Eugene): https://issues.jenkins-ci.org/browse/JENKINS-48955