serverless: 'Concurrent update operation' error for multi-function service results in both deployment and rollback failure.

This is a Bug Report

Description

For bug reports:

  • What went wrong? Serverless deploy failed due to a concurrent update operation on my multi-function (7 functions) java8 service. Subsequently, the Rollback also fails. I have to manually do a “User Initiated” rollback to get it to rollback completely. This has never happened before upgrading from serverless 1.25.0 to serverless 1.27.x .

  • What did you expect should have happened? Serverless deploy success for my multi-function (7 functions) java8 service. I was able to reproduce 100% success with serverless 1.25.0 and 100% error/failure with serverless 1.27.x (tried with both 1.27.0 and 1.27.2). It would also be nice to not have to manually initiate a rollback to complete the failed rollback.

  • What was the config you used? I am using Maven to package the code for each function. There are 7 functions defined in my serverless.yaml .

service: MyLambdaService

plugins:
  - serverless-plugin-log-retention

provider:
  name: aws
  runtime: java8
  stage: ${opt:stage}
  region: ${opt:region}
  memorySize: 256
  timeout: 300
  schedule: 1 hour
  role: myLambdaRole
  vpc:
      securityGroupIds:
        - Ref: myLambdaSG
      subnetIds:
        Fn::Split:
        - ', '
        - Fn::ImportValue:
            Fn::Sub: ${opt:stage}-NATSubnetsList
  environment:
    DB_PORT: 5432
    DB_HOST: ${opt:dbhost}
    DB_USERNAME: ${opt:dbuser}
    DB_PASSWORD: ${opt:dbpassword}
    DB_NAME: mydb
    KINESIS_REGION: ${self:provider.region}
    REDIS_URL:
      Fn::ImportValue: MyRedis-${opt:stage}-JobUrl

custom:
  logRetentionInDays: 5
  enabled:
    prod: true
    other: false

package:
  individually: true
  • What stacktrace or error message from your provider did you see? Both the stacktrace and the provider error message are located below in “Additional Data”

For feature proposals:

  • What is the use case that should be solved. The more detail you describe this in the easier it is to understand for us.
  • If there is additional config how would it look

Similar or dependent issues:

  • None

Additional Data

  • Serverless Framework Version you’re using: 1.27.x
  • Operating System: macos, debian
  • Stack Trace:
Serverless: Packaging service...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service .zip file to S3 (16.28 MB)...
Serverless: Uploading service .zip file to S3 (15.33 MB)...
Serverless: Uploading service .zip file to S3 (15.37 MB)...
Serverless: Uploading service .zip file to S3 (29.21 MB)...
Serverless: Uploading service .zip file to S3 (29.92 MB)...
Serverless: Uploading service .zip file to S3 (29.74 MB)...
Serverless: Uploading service .zip file to S3 (10.4 MB)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
............................
Serverless: Operation failed!
 
  Serverless Error ---------------------------------------
 
  An error occurred: ManageLambdaFunction - The function could not be updated due to a concurrent update operation..
 
  Get Support --------------------------------------------
     Docs:          docs.serverless.com
     Bugs:          github.com/serverless/serverless/issues
     Issues:        forum.serverless.com
 
  Your Environment Information -----------------------------
     OS:                     linux
     Node Version:           8.11.1
     Serverless Version:     1.27.0

  • Provider Error messages:
03:32:02 UTC+0000	UPDATE_ROLLBACK_FAILED	AWS::CloudFormation::Stack	MyLambda-Automation-qa	The following resource(s) failed to update: [PrepareManageLambdaFunction, PublishOptimizationLambdaFunction, ManageLambdaFunction, CoordinateLambdaFunction, TrainLambdaFunction, OptimizeLambdaFunction].
03:32:01 UTC+0000	UPDATE_FAILED	AWS::Lambda::Function	PrepareManageLambdaFunction	The function could not be updated due to a concurrent update operation.
03:32:01 UTC+0000	UPDATE_FAILED	AWS::Lambda::Function	PublishOptimizationLambdaFunction	The function could not be updated due to a concurrent update operation.
03:32:01 UTC+0000	UPDATE_FAILED	AWS::Lambda::Function	OptimizeLambdaFunction	The function could not be updated due to a concurrent update operation.
03:32:00 UTC+0000	UPDATE_FAILED	AWS::Lambda::Function	CoordinateLambdaFunction	The function could not be updated due to a concurrent update operation.
03:32:00 UTC+0000	UPDATE_FAILED	AWS::Lambda::Function	ManageLambdaFunction	The function could not be updated due to a concurrent update operation.
03:32:00 UTC+0000	UPDATE_FAILED	AWS::Lambda::Function	TrainLambdaFunction	The function could not be updated due to a concurrent update operation.
03:31:59 UTC+0000	UPDATE_IN_PROGRESS	AWS::Lambda::Function	OptimizeLambdaFunction	
03:31:59 UTC+0000	UPDATE_IN_PROGRESS	AWS::Lambda::Function	ManageLambdaFunction	
03:31:59 UTC+0000	UPDATE_IN_PROGRESS	AWS::Lambda::Function	PrepareManageLambdaFunction	
03:31:59 UTC+0000	UPDATE_IN_PROGRESS	AWS::Lambda::Function	PublishOptimizationLambdaFunction	
03:31:59 UTC+0000	UPDATE_IN_PROGRESS	AWS::Lambda::Function	CoordinateLambdaFunction	
03:31:58 UTC+0000	UPDATE_IN_PROGRESS	AWS::Lambda::Function	TrainLambdaFunction	
03:31:57 UTC+0000	UPDATE_COMPLETE	AWS::S3::Bucket	ServerlessDeploymentBucket	
03:31:45 UTC+0000	UPDATE_COMPLETE	AWS::Logs::LogGroup	TrainLogGroup	
03:31:43 UTC+0000	UPDATE_IN_PROGRESS	AWS::Logs::LogGroup	TrainLogGroup	
03:31:37 UTC+0000	UPDATE_COMPLETE	AWS::Logs::LogGroup	CoordinateLogGroup	
03:31:37 UTC+0000	UPDATE_COMPLETE	AWS::Logs::LogGroup	OptimizeLogGroup	
03:31:37 UTC+0000	UPDATE_COMPLETE	AWS::Logs::LogGroup	PrepareManageLogGroup	
03:31:37 UTC+0000	UPDATE_COMPLETE	AWS::Logs::LogGroup	PublishOptimizationLogGroup	
03:31:37 UTC+0000	UPDATE_COMPLETE	AWS::Logs::LogGroup	PublishTrainingLogGroup	
03:31:37 UTC+0000	UPDATE_COMPLETE	AWS::Logs::LogGroup	ManageLogGroup	
03:31:37 UTC+0000	UPDATE_IN_PROGRESS	AWS::Logs::LogGroup	CoordinateLogGroup
03:31:37 UTC+0000	UPDATE_IN_PROGRESS	AWS::Logs::LogGroup	PublishOptimizationLogGroup	
03:31:37 UTC+0000	UPDATE_IN_PROGRESS	AWS::Logs::LogGroup	OptimizeLogGroup	
03:31:37 UTC+0000	UPDATE_IN_PROGRESS	AWS::Logs::LogGroup	PrepareManageLogGroup	
03:31:37 UTC+0000	UPDATE_IN_PROGRESS	AWS::Logs::LogGroup	PublishTrainingLogGroup	
03:31:37 UTC+0000	UPDATE_IN_PROGRESS	AWS::S3::Bucket	ServerlessDeploymentBucket	
03:31:37 UTC+0000	UPDATE_IN_PROGRESS	AWS::Logs::LogGroup	ManageLogGroup

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 6
  • Comments: 19 (3 by maintainers)

Most upvoted comments

Its 2019 and we still get this problem

I just heard back from AWS Premium Support, and they offered up a solution and the cause of the issue. It’s not so much an issue with too many functions, as it is trying to do too many updates with a single function.

For me, the problem was that I had a “reservedConcurrency” setting which was causing too many updates to try to happen at the same time to the same function. Here’s their response:

Investigating this issue further I found that this happens when you update the “ReservedConcurrentExecutions” property of the AWS::Lambda::Function along with any other property e.g. Timeout, Description etc.

In your case, for the provided Stack 'XXXX ', I could find that you were trying to update multiple attributes in a single go: ‘S3Key’,‘APP_VERSION’ and ‘ReservedConcurrentExecutions’ and this would have caused the issue.

Further when I checked the CloudTrail API calls, I can see that the update to the stack tries to make changes almost simultaneously and from the logs you should be able to see API calls being fired at the same time -

PutFunctionConcurrency20171031 	                2018-06-11T16:12:10.000Z
UpdateFunctionConfiguration20150331v2 	   2018-06-11T16:12:10.000Z 

This can cause a conflict. It’s the ‘PutFunctionConcurrency’ API calls that hits the error because of the ‘UpdateFunctionConfiguration’ that tries to update the Lambda Function Configuration at the same time.

Note : This error might also occur when two separate stacks are trying to make an update to the same Lambda function at the same time.

I’ve confirmed that our CloudFormation team are aware of this issue as there are multiple customers who have reported the same issue and I have added your voice to the existing internal ticket for this fix.

Unfortunately, I do not have an ETA as to when this will be implemented but I do encourage you to keep track of our forums [1], our blogs [2] and CloudFormation release notes[3], as these are common channels used by our internal teams to publish updates and general information related to new feature releases.

Workaround :

In the meantime, an easy way to avoid this issue is to simply apply updates to a AWS::Lambda::Function resource such that the “ReservedConcurrentExecutions” property is updated as part of a separate stack update.

So could you try the updates once more, but this time, try to cut out the “ReservedConcurrentExecutions” as a separate update, de-coupled from any other updates to your Lambda Function.

In my case, I just removed the reservedConcurrency settings and did the deploy again.

I have a workaround - set the reservedConcurrency value to what you want it to be updated to in AWS Console before doing the deployment, and it works!

Another Tip: You don’t need to delete the CF stack, you can continue the rollback and it will return the Lambda to a good state

This is causing issues deploying our stacks. Is there any work in progress for this? Or a workaround that isn’t deploying functions without concurrency settings? Or something tracking the bug in CloudFormation if that’s the problem.

Was having the same issue with reservedConcurrency but in my case I was trying to remove the setting in my large update. Adding the setting back, deploying, and then removing and deploying again was still a successful workaround.

@kopertop Thanks for your update and the info from AWS Support.

I can verify that I am setting reservedConcurrency at the function level for all my functions defined in my serverless.yaml . Clearly this is a bug in Serverless which needs to be addressed as reservedConcurrency is an important setting in production environments.

We just ran into this issue on production as well, should we flag it here or push AWS?

Awesome, really appreciate the comment from @swissarmykirpan , saved our butts since this happened against our production stack. Continued rollback, then commented out the concurrency settings in serverless.yml, deployed, uncommented the concurrency settings, deployed again

I do feel the need to chime in here. This is not an issue with CloudFormation. CloudFormation is clearly telling us this is how CloudFormation works. If we want this to work w/o the manual steps, it sounds like serverless could send one CF template w/o concurrency settings, then send another CF template w/ the concurrency settings

Or some other technical solution to this problem. I can say our team relies on concurrency settings to prevent our production application from dog-piling on itself during traffic spikes. Being able to code concurrency settings into serverless has been an amazing help for us. Until it started doing this 😕

In fact it means that this is a Cloudformation issue.

Serverless just sends the generated Cloudformation template with the configuration changes and Cloudformation doesn’t know that changes must be sequentially done.

@kopertop I found same problem when added reservedConcurrency to serverless.yml file. After deleting stack and redeploying problem has disappeared.

In case it helps anyone, I was hitting a similar error deploying Lambda functions and permissions with Terraform, and figured out the cause and solution: https://github.com/terraform-providers/terraform-provider-aws/issues/5154#issuecomment-423912491

Seeing the same error.

[2]   Serverless Error ---------------------------------------
[2]  
[2]   An error occurred: ApiLambdaFunction - The function could not be updated due to a concurrent update operation. (Service: AWSLambda; Status Code: 409; Error Code: ResourceConflictException; Request ID: a8fd5185-645d-11e8-ab8a-65e35d8578b7).
[2]  
[2]   Get Support --------------------------------------------
[2]      Docs:          docs.serverless.com
[2]      Bugs:          github.com/serverless/serverless/issues
[2]      Issues:        forum.serverless.com
[2]  
[2]   Your Environment Information -----------------------------
[2]      OS:                     linux
[2]      Node Version:           8.9.4
[2]      Serverless Version:     1.27.3