serverless-application-model: SqsSubscription: true fails to deploy

Description:

Found a bug while using SAM’s new SQS queue automatic generation with the SNS Event Type feature: If your Lambda function timeout is greater than 30 seconds and you’re using the SqsSubscription: true property to create an SQS queue with default settings, your deployment will fail with an error from the Lambda service saying the Lambda function timeout cannot be greater than the SQS queue’s visibility timeout. SAM should pass the Lambda function timeout through as the SQS queue visibility timeout.

Sample template:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  Topic:
    Type: AWS::SNS::Topic

  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      InlineCode: |
        exports.handler = async (event) => {
          return {}
        }
      Handler: index.handler
      Runtime: nodejs12.x
      Timeout: 60
      Events:
        TopicEvent:
          Type: SNS
          Properties:
            Topic: !Ref Topic
            SqsSubscription: true

Steps to reproduce the issue:

  1. Attempt to deploy the above template.

Observed result:

Deployment fails with this error when trying to create the Lambda event source mapping:

Queue visibility timeout: 30 seconds is less than Function timeout: 60 seconds (Service: AWSLambda; Status Code: 400; Error Code: InvalidParameterValueException; Request ID: ****) The following resource(s) failed to create: [MyFunctionTopicEventEventSourceMapping]. . Rollback requested by user.

Expected result:

Successful deployment.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 6
  • Comments: 16 (3 by maintainers)

Most upvoted comments

IMO this constraint shouldn’t exist. This should be a warning, not an error. If I want to set the queue visibility timeout to be less than the lambda timeout, because of some weird use case that I have, then I should be able to do that. It isn’t AWS’s job to enforce this.

Edit: just as an example, here is a perfectly reasonable use case that in fact improves upon AWS’ default behaviour, but is currently impossible to implement due to this bizzare opinionated enforcement: Setting a very low visibility timeout on the queue, but upon a succesful lambda invocation, immediately increasing it - in code - to the lambda timeout (or higher), for all messages in the batch. This avoids waiting for the visibility timeout to pass (which can be take many minutes) in case of throttles, which are very common for SQS triggers due to how the connection scaling works, and significantly improves the queue processing time. It makes no sense that AWS would disallow doing something like this.

The purpose of this restriction is to prevent duplicate processing of messages. If your Lambda takes 900 seconds maximum to process a message, and the VisibilityTimeout is 30 seconds, then SQS will invoke your Lambda function every 30 seconds until it gets a confirmed result (after 900 seconds). This restriction is meant as a convenience to you, to help you avoid poor architectural decisions. There are ways to work around it if you are determined.

The purpose of this restriction is to prevent duplicate processing of messages. If your Lambda takes 900 seconds maximum to process a message, and the VisibilityTimeout is 30 seconds, then SQS will invoke your Lambda function every 30 seconds until it gets a confirmed result (after 900 seconds).

I get that the goal is to stop developers from shooting themselves in the foot. But the above isn’t true if they follow the AWS docs/guidelines: “If you don’t know how long it takes to process a message, create a heartbeat for your consumer process” https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html

The length of time it takes to process a message in my SQS varies greatly depending on the payload it contains. I set my lambda timeout to 900s, and my queue’s visibility timeout to 60s. But as long as the message is still being processed and hasn’t failed I update the visibility timeout for that specific message using the heartbeat approach recommended.

Maybe there could be an additional option added to indicate “I (think) I know what I’m doing”.

@piersf SAM needs to add VisibilityTimeout property for SQS queue resource that SAM generates and set the value to Timeout given for the Lambda function. The value of VisibilityTimeout being always greater than or equal to Lambda Timeout is not set by SAM.

Looks like this also affects queues that are explicitly defined and referenced in the lambda:

Resources:
  MyLambda:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: MyLambda
      Timeout: 60
      Events:
        Queue:
          Type: SQS
          Properties:
            BatchSize: 1
            Queue: !GetAtt MyQueue.Arn
  MyQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: MyQueue
      VisibilityTimeout: 30

What’s more, the error seems to only show up when creating a new lambda (or renaming an existing one which basically creates a new one).

When only updating the timeouts of an existing lambda and/or queue – either increasing the lambda timeout above the queue, or lowering the queue below the lambda – the error does not seem to show and the stack update succeeds. This is quite an inconsistent behavior.

I see. What about adding a default visibility timeout based on the lambda timeout for cases where the lambda timeout exceeds the current default visibility timeout?

How should we handle when the default changes in the underlying resources?

Typically, SAM tries to avoid validation and assumptions on the underlying resources. This has become brittle when the services change things. Instead, we let as much of the validation happen by CloudFormation or the AWS Service. So in this case, we would need to assume 30 will never change and if it does, we are back in this same conversation.

We did talk about adding VisibilityTimeout as a way to configure directly in SAM. What this becomes is SAM trying to keep up with the underlying resources as new configs come up/need configuration. This is why we landed on leaving what we have as is for the “simple” cases and when you need more configuration to provide the SQS.

@praneetap, @ShreyaGangishetty can you please confirm (or infirm, if it’s the case) the following statement?

For the moment, we cannot set the default visibility timeout for a SQS queue to be smaller than the timeout of the Lambda which is consuming the SQS queue messages.

I’m asking this because my team ran today into a similar CloudFormation change set execution issue: "Queue visibility timeout: 30 seconds is less than Function timeout: 900 seconds (Service: AWSLambda; Status Code: 400; Error Code: InvalidParameterValueException; … "

If this is true, we are forced to have the SQS default visibility timeout >= Lambda timeout

Also, a quite misleading (if not actually a bug) thing was the fact that this error was thrown in CloudFormation, but when we changed the SQS default visibility timeout from AWS console, it worked. This means that the probable equality constraint is not checked when applying the update from AWS console, but it is enforced in CloudFormation. The AWS console update is not a valid choice for us because we deploy our changes from infrastructure code. The AWS console update was made just for testing the update which should be made from the infrastructure code -> CloudFormation.