aws-sdk-js-v3: ECONNRESET exceptions when running in Lambda environment

Describe the bug

import { S3 } from '@aws-sdk/client-s3';
import { Handler, Context, S3Event } from 'aws-lambda';

const s3 = new S3({})

export const handler: Handler = async (event: S3Event, context: Context) => {
  await s3.getObject({
    Bucket: event.Records[0].s3.bucket.name,
    Key: event.Records[0].s3.object.key,
  });
}

We have this very basic lambda function that reads the file from S3 when a new file is uploaded (we actually consume the Body stream too, left that out for brevity). The function is called intermittently meaning that sometimes we get a new Lambda function (i.e. cold) sometimes the Lambda container is reused. When the container is reused, we sometimes see a ECONNRESET exception such as this one

2020-05-20T16:50:28.107Z	d7a43394-afad-4267-a4a4-5ad3633a1db8	ERROR	Error: socket hang up
    at connResetException (internal/errors.js:608:14)
    at TLSSocket.socketOnEnd (_http_client.js:460:23)
    at TLSSocket.emit (events.js:322:22)
    at endReadableNT (_stream_readable.js:1187:12)
    at processTicksAndRejections (internal/process/task_queues.js:84:21) {
  code: 'ECONNRESET',
  '$metadata': { retries: 0, totalRetryDelay: 0 }
}

I’m pretty confident that this is due to the keep-alive nature of the https connection. Lambda processes are frozen after they execute and their host seems to terminate open sockets after ~10 minutes. The next time the S3 client tries to reuse the socket, the exception is thrown.

We are running into similar issues with connections to our Aurora database which also terminates intermittently with the same error message (see https://github.com/brianc/node-postgres/issues/2112). It’s an error we can easily recover from if we try to reopen the socket but aws-sdk-v3 seems to prefer to throw an error message instead.

Is the issue in the browser/Node.js? Node.js 12.x on AWS Lambda

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 20
Comments: 35 (8 by maintainers)

Most upvoted comments

I’ve managed to work around this using this configuration (updated for gamma):

import {
    StandardRetryStrategy,
    defaultRetryDecider,
} from '@aws-sdk/middleware-retry';
import { SdkError } from '@aws-sdk/smithy-client';

const retryDecider = (err: SdkError & { code?: string }) => {
    if (
        'code' in err &&
        (err.code === 'ECONNRESET' ||
            err.code === 'EPIPE' ||
            err.code === 'ETIMEDOUT')
    ) {
        return true;
    } else {
        return defaultRetryDecider(err);
    }
};
// eslint-disable-next-line @typescript-eslint/require-await
const retryStrategy = new StandardRetryStrategy(async () => '3', {
    retryDecider,
});
export const defaultClientConfig = {
    maxRetries: 3,
    retryStrategy,
};

It would be nice if this was built-in to defaultRetryDecider. Although, is there an argument for this being built-in to the NodeHttpHandler, as this is a node-specific error, and one where the handler should probably “just work”?

+10

studds on Sep 1, 2020

Info AWS lambda Node.js 12.x “@aws-sdk/client-dynamodb”: “^1.0.0-gamma.1”

Lambda

import { DynamoDBClient, DescribeTableCommand } from "@aws-sdk/client-dynamodb"

const dynamo = new DynamoDBClient({})

export const tempDebug = async (): Promise<object> => {
  const res = await dynamo.send(new DescribeTableCommand({
    TableName: '<TableName>'
  }))

  return Promise.resolve(res.Table)
}

Local

import { LambdaClient, InvokeCommand } from "@aws-sdk/client-lambda"

declare const TextDecoder
const lambda = new LambdaClient({})

;(async () => {
  let counter = 0
  // eslint-disable-next-line no-constant-condition
  while (true) {
    console.log(counter)
    counter++
    const res = await lambda.send(new InvokeCommand({
      FunctionName: '<FunctionName>'
    }))
    
    const obj = JSON.parse(new TextDecoder("utf-8").decode(res.Payload))
    if (obj.errorType === 'Error') {
      console.log(obj)
      break
    }

    //await new Promise(resolve => setTimeout(resolve, 5 * 60 * 1000))
    await new Promise(resolve => setTimeout(resolve, 90 * 1000))
  }
})()

Produces the following errors consistently when run with ~~5 minute~~ 90 sec intervals. First call works, second call after ~~5 minutes~~ 90 seconds produce 1 of the following 2 errors. Error logs are from CloudWatch.

{
    "errorType": "Error",
    "errorMessage": "write EPIPE",
    "code": "EPIPE",
    "errno": "EPIPE",
    "syscall": "write",
    "$metadata": {
        "retries": 0,
        "totalRetryDelay": 0
    },
    "stack": [
        "Error: write EPIPE",
        "    at WriteWrap.onWriteComplete [as oncomplete] (internal/stream_base_commons.js:92:16)",
        "    at writevGeneric (internal/stream_base_commons.js:132:26)",
        "    at TLSSocket.Socket._writeGeneric (net.js:782:11)",
        "    at TLSSocket.Socket._writev (net.js:791:8)",
        "    at doWrite (_stream_writable.js:401:12)",
        "    at clearBuffer (_stream_writable.js:519:5)",
        "    at TLSSocket.Writable.uncork (_stream_writable.js:338:7)",
        "    at ClientRequest._flushOutput (_http_outgoing.js:862:10)",
        "    at ClientRequest._flush (_http_outgoing.js:831:22)",
        "    at _http_client.js:315:47"
    ]
}

{
    "errorType": "Error",
    "errorMessage": "socket hang up",
    "code": "ECONNRESET",
    "$metadata": {
        "retries": 0,
        "totalRetryDelay": 0
    },
    "stack": [
        "Error: socket hang up",
        "    at connResetException (internal/errors.js:608:14)",
        "    at TLSSocket.socketOnEnd (_http_client.js:453:23)",
        "    at TLSSocket.emit (events.js:322:22)",
        "    at endReadableNT (_stream_readable.js:1187:12)",
        "    at processTicksAndRejections (internal/process/task_queues.js:84:21)"
    ]
}

Works as expected when run with 1 minute intervals.

samirda on May 28, 2020

This issue is fixed in https://github.com/aws/aws-sdk-js-v3/pull/1693, and will be published in rc.7 on Thursday 11/19

trivikr on Nov 18, 2020

Hi @rraziel, I’m currently looking into how JS SDK v2 handles this and will provide a fix in v3 accordingly.

are you saying this is “just” an error that’s not properly handled?

The current behavior in undesirable, and the SDK should retry the error instead of asking user to do it.

trivikr on Nov 16, 2020

Using the fix of serverless-nextjs fixed it for me. This is not at all a permanent solution as it will requery continuously when the matched status code will get returned.

TS implementation:

import type { SdkError } from '@aws-sdk/smithy-client'
import {
	defaultRetryDecider,
	StandardRetryStrategy,
} from '@aws-sdk/middleware-retry'

// fix error in SDK release candidate
// see: https://github.com/aws/aws-sdk-js-v3/issues/1196
// see: https://github.com/serverless-nextjs/serverless-next.js/pull/720/files
export const retryStrategy = new StandardRetryStrategy(async () => 3, {
	retryDecider: (err: SdkError & { code?: string }) => {
		if (
			'code' in err &&
			(err.code === 'ECONNRESET' ||
				err.code === 'EPIPE' ||
				err.code === 'ETIMEDOUT')
		) {
			return true
		} else {
			return defaultRetryDecider(err)
		}
	},
})

import { DynamoDB } from '@aws-sdk/client-dynamodb'
const dynamodbClient = new DynamoDB({ retryStrategy })

the retryStrategy prop is available in all clients AFAIK.

Hoping for an actual fix in the next RC

martinjuhasz on Nov 11, 2020

So we are at release candidate 4 and this problem has not even been acknowledged 😞

monken on Nov 6, 2020

Has anyone from AWS or a maintainer even commented on this issue? This seems like this should be a priority given it happens in most use-cases unless you rarely call your lambdas.

rraziel on Oct 21, 2020

iam testing 1.0.0-gamma.10 in production with loging over custom retry strategy

polovi on Oct 4, 2020

Issues are still happening in 1.0.0-gamma.6 😕

maxgr0 on Aug 12, 2020

clients in 1.0.0-gamma.3 now retry in case of Transient Errors

It doesn’t check for ECONNRESET, ETIMEDOUT or EPIPE though

trivikr on Jul 9, 2020

@studds Thanks for the elegant solution. This is working perfect for me now.

abierbaum on Jun 27, 2020