aws-cdk: cli: Socket timed out without establishing a connection when --asset-parallelism=true

Describe the bug

I have anywhere between 20-50 nodejs lambda functions in single stack and I update their dependencies and deploy with cdk.

But lately I am not able to deploy updates. I get following error when I deploy.

current credentials could not be used to assume 'arn:aws:iam::******:role/cdk-hnb659fds-lookup-role-******-us-east-1', but are for the right account. Proceeding anyway.
(To get rid of this warning, please upgrade to bootstrap version >= 8)
current credentials could not be used to assume 'arn:aws:iam::******:role/cdk-hnb659fds-file-publishing-role-******-us-east-1', but are for the right account. Proceeding anyway.
current credentials could not be used to assume 'arn:aws:iam::******:role/cdk-hnb659fds-file-publishing-role-******-us-east-1', but are for the right account. Proceeding anyway.
[9%] fail: Socket timed out without establishing a connection
[18%] fail: Socket timed out without establishing a connection

I keep trying again and again and sometimes it goes through and most of the time it doesn’t work. Only stack with lower number of lambda functions sometimes gets deployed. But stack with large number of lambda functions fails 100% of the time.

Expected Behavior

I expected it to deploy no matter number of lambda functions in the stack. It used to get deployed without any problem.

Current Behavior

current credentials could not be used to assume 'arn:aws:iam::******:role/cdk-hnb659fds-lookup-role-******-us-east-1', but are for the right account. Proceeding anyway.
(To get rid of this warning, please upgrade to bootstrap version >= 8)

I don’t know how to upgrade bootstrap version. I ran cdk bootstrap multiple times and it says no changes.

Reproduction Steps

const testSignUpFn = new NodejsFunction(this, 'testSignUpNodeJS', {
      runtime: Runtime.NODEJS_14_X,
      entry: `${__dirname}/../lambda-fns/sign-up/index.ts`,
      handler: 'signUp',
      architecture: Architecture.ARM_64,
      memorySize: 1024
    })

It was working before but suddenly stopped working.

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.20.0 (build 738ef49)

Framework Version

No response

Node.js Version

v16.14.2

OS

Ubuntu 20.04 on WSL 2

Language

Typescript

Language Version

~3.9.7

Other information

No response

About this issue

Original URL
State: open
Created 2 years ago
Reactions: 3
Comments: 18 (2 by maintainers)

Most upvoted comments

This does appear to be related to the Asset Parallelism feature. Executing a deployment with –asset-parallelism=false resulted in a successful deployment.

When running without –asset-parallelism=false the stack failed on the following error:

Call failed: listObjectsV2({"Bucket":"cdk-hnb659fds-assets-ACCOUNT_ID-eu-west-2","Prefix":"0936406e22fea26017ecca536fcbdc550936406e22fea26017ecca536fcbdc55.zip","MaxKeys":1}) => Socket timed out without establishing a connection (code=TimeoutError)

There are only four assets in the bucket and none of them are over 50KB.

System information: OS: Ubuntu 20.04 NodeJS Version: v16.3.0 CDK verison: 2.51.1

jscrobinson on Nov 28, 2022

Turns out our issue was caused by setting NODE_OPTIONS=--enable-source-maps in our deployment pipeline.

CDK is compiled into a single 28 MB .js file, accompanied with a 58 MB source map. This causes excessive load, especially due to the high parallelism that CDK uses. I have patched out all the unqueued IO processes and replaced all the hardcoded parallelization values with require("os").cpus().length. This resolved our timeouts and we were able to deploy again.

Soon after, we realized that deployment performance was dramatically improved by upgrading to Node@20. This is due to this change in Node@19.6. Previously, we ran Node@18 LTS, which was also the highest supported version of CDK at the time. This change in Node@19.6 introduces caching for the parsed source maps, which resolves this whole problem entirely (for us).

I stand by my point that the way CDK handles IO is ridiculous. I also think bundling a NodeJS module into a single 28 MB file, with a 58 MB source map is ridiculous.

As Node@18 is also the latest supported runtime by AWS Lambda, be cautious when using --enable-source-maps at runtime, because similar performance issues can be observed there, especially during exception handling.

p.s.: The reason it worked for us locally was, that nobody set --enable-source-maps locally, or people were already on Node@20 locally.

oliversalzburg on Aug 30, 2023

Facing with this issue regularly now on 2.39.1 When I enable vpn and deploy again this error disappear so looks like this is somehow related to connection establishing issue.

viktorchukhantsev on Aug 31, 2022

We conducted further research into this. It seems like what CDK calls “parallelism” is just waiting for multiple promises on the same single thread, there is no work happening in parallel at all. This is combined with the extremely poor single-core performance of the GitHub Actions runner fleet, and you end up with a fully saturated core for the entire runtime of your pipeline, regardless of how many cores you give it.

When I asked AWS reps about this, they told me that using the public runner fleet is a bad choice to begin with. You probably want to invest in some fat self-hosted runner with a single 5GHz core.

I’m pressing our client to move away from CDK ASAP, but we will likely solve this problem with money in the mid-term. This is not a good product.

oliversalzburg on Aug 24, 2023

We also have to use the --asset-parallelism=false workaround to be able to deploy at all. With 2.83, a new parallelism feature was introduced to improve performance. Now our deployments are entirely broken, regardless of --asset-parallelism.

In general, a real solution for the underlying issue would be appreciated.

In case it helps, we only see the problematic behavior when deploying from GitHub Actions. If we run the same deploy locally, it completes dramatically faster and without issues. So far, all our research regarding environment differences have been fruitless.

oliversalzburg on Jun 12, 2023