terraform-provider-aws: source_code_hash does not update

This issue was originally opened by @joerggross as hashicorp/terraform#20152. It was migrated here as a result of the provider split. The original body of the issue is below.


Terraform Version

v0.11.11

Terraform Configuration Files

data "aws_s3_bucket_object" "lambda_jar_hash" {
  bucket = "${var.lambda_s3_bucket}"
  key    = "${var.lambda_s3_key}.sha256"
}

resource "aws_lambda_function" "lambda_function_s3" {

  s3_bucket = "${var.lambda_s3_bucket}"
  s3_key = "${var.lambda_s3_key}"
  s3_object_version = "${var.lambda_s3_object_version}"

  function_name = "${var.lambda_function_name}"
  role = "${var.lambda_execution_role_arn}"
  handler = "${var.lambda_function_handler}"
  source_code_hash = "${base64encode(data.aws_s3_bucket_object.lambda_jar_hash.body)}"
  runtime = "java8"
  memory_size = "${var.lambda_function_memory}"
  timeout = "${var.lambda_function_timeout}"
  description = "${var.description}"
  reserved_concurrent_executions = "${var.reserved_concurrent_executions}"

}

Debug Output

~ module.comp-price-import-data-reader-scheduled-lambda.aws_lambda_function.lambda_function_s3 last_modified: “2019-01-30T11:58:32.826+0000” => <computed> source_code_hash: “6HVMIk6vxvBy4AApmHbQis5Av2uQeSJh3XRosmKtv0U=” => “ZTg3NTRjMjI0ZWFmYzZmMDcyZTAwMDI5OTg3NmQwOGFjZTQwYmY2YjkwNzkyMjYxZGQ3NDY4YjI2MmFkYmY0NQ==”

Plan: 0 to add, 1 to change, 0 to destroy.

Crash Output

~ module.comp-price-import-data-reader-scheduled-lambda.aws_lambda_function.lambda_function_s3 last_modified: “2019-01-30T11:58:32.826+0000” => <computed> source_code_hash: “6HVMIk6vxvBy4AApmHbQis5Av2uQeSJh3XRosmKtv0U=” => “ZTg3NTRjMjI0ZWFmYzZmMDcyZTAwMDI5OTg3NmQwOGFjZTQwYmY2YjkwNzkyMjYxZGQ3NDY4YjI2MmFkYmY0NQ==”

Plan: 0 to add, 1 to change, 0 to destroy.

Expected Behavior

We generate an additional file in the s3 bucket along with the lambda jar file to be deployed in s3. The additional file contains a SHA256 hash of the deployed jar file. The hash value of the file is set to the source_code_hash property of the lamba function, by using the bas64 encode function.

We would expect that the hash is stored in the tfsate and reused when applying the scripts, so that the lambda jar file is not redeployed unless the hash changes.

Actual Behavior

We applied the scripts different times without changing the jar or hash file in s3. Nevertheless terraform always redeployes the jar. The output (see above) is always the same (“6HVMIk6vxvBy4AApmHbQis5Av2uQeSJh3XRosmKtv0U=” => “ZTg3NTRjMjI0ZWFmYzZmMDcyZTAwMDI5OTg3NmQwOGFjZTQwYmY2YjkwNzkyMjYxZGQ3NDY4YjI2MmFkYmY0NQ==”). It seems the the given hash is never stored in the tfstate.

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 35
  • Comments: 25 (5 by maintainers)

Commits related to this issue

Most upvoted comments

We’re seeing the exact same issue, source_code_hash is never updated in the tfstate when applying so the lambda resource always requires updating no matter how many times we apply:

      ~ source_code_hash               = "83TsTFxfrLQJvQ8Re1YdXiGX2eQm1a1uX8Sc0bKeC3w=" -> "p6F5Wk4naphwng6ZQRNahuvJ7BUEFfHnMR9wQQpVkCM="

I’m reporting the same concern, too.

The main problem is the purpose of source_code_hash isn’t clear. The documentation of aws_lambda_function states that source_code_hash is an argument that seems to have an impact to deployment, but that doesn’t seem to be the case.

Looking at the source code, it is a computed field. After a successful deploy, the value of source_code_hash is overwritten by the response from AWS’s API (code) by calling resourceAwsLambdaFunctionRead().

In short, the value assigned to source_code_hash doesn’t affect deployment and is always overwritten, unless otherwise it matches, the hash returned by AWS API.

What we need

We need a way to deterministically trigger lambda deployments (e.g. after code change is detected) without presumptions that everyone uses the same process in packaging their code.

Is source_code_hash the correct attribute to use for this? Yes and no. I’d be nice to keep the hash returned by AWS’s API, but probably we’d need another attribute similar to source_code_hash that meets our need.

Suggestion

  1. Update documentation so that source_code_hash is clearly defined as an output.
  2. Remove Optional:true from the schema for source_code_hash
  3. Add a new attribute change_trigger_hash that is optional and not computed. Suggestions for better name are welcome.
  4. If change_trigger_hash is null, then plan and apply would work as how they are working now
  5. If change_trigger_hash is not null, then compare current value to previous value. They they are the same, include change in plan. Otherwise, ignore resource change.

@aeschright does this sound like something that we can do? I’ll submit a PR if yes

=========================== Update: Upon looking further, source_code_hash indeed triggers a change which makes my suggestion invalid. I’ll try an idea out which I hope would work

I had a very similar problem where the statefile was not getting an updated source_code_hash after an apply. @Miggleness pointed me in the right direction by noting that the value in source_code_hash is overwritten by AWS. This means that the hash you use in your lambda resource definition must be computed the same way that AWS computes the hash. Otherwise, you will always have a different value in your source_code_hash, and your lambda will always be redeployed.

So when you see something like:

 ~ source_code_hash = "QuYMcyiptpzreIVxuq8AL+UWobBp3pDq045f2ISoKB0=" -> "42e60c7328a9b69ceb788571baaf002fe516a1b069de90ead38e5fd884a8281d" # forces replacement

The value on the left is the AWS calculated hash, and the value on the right is the value you are providing terraform in your lambda definition.

If you calculate the hash yourself with shell, use the following algorithm:

openssl dgst -sha256 -binary ${FILE_NAME}.zip | openssl enc -base64

If you calculate it with a python script, use something like the following:

import base64
import hashlib


def get_aws_hash(zip_file):
    '''Compute bash64 sha256 hash of zip archive, with aws algorithm.'''
    with open(zip_file, "rb") as f:
        sha256_hash = hashlib.sha256()

        # Read and update hash string value in blocks of 4K
        while byte_block := f.read(4096):
            sha256_hash.update(byte_block)

    hash_value = base64.b64encode(sha256_hash.digest()).decode('utf-8')

    return hash_value

I’m experiencing this with v0.12.20 and aws provider v2.65.0 with a zip file that’s referenced from an s3 bucket.


data "aws_s3_bucket_object" "lambda" {
  bucket = aws_s3_bucket.lambda.id
  key    = "lambda.zip"
}

resource "aws_lambda_function" "lambda" {
  s3_bucket        = aws_s3_bucket.lambda.id
  s3_key           = data.aws_s3_bucket_object.lambda.key
  function_name    = "lambda"
  role             = aws_iam_role.lambda.arn
  handler          = "lambda.handler"
  timeout          = 300
  memory_size      = 256
  source_code_hash = base64sha256(data.aws_s3_bucket_object.lambda.etag)
  runtime          = "python3.8"
}

I’m using the etag from the s3 object as the input for the hash, which shouldn’t change unless we upload a new version.

When I run apply twice in a row, the input hash is always the same, but the new hash is not being persisted to the state and the next run shows the same output.

 ~ source_code_hash               = "FxFe/pitsCj4XL/F+VORZASkGZdejRgNc7OABiKaWpg=" -> "oE4rN1nboxBBF64fQl8Q0GPtAE7bLqOofP/ACZPPz2A="

An easier (alternative) way to update lambda function on code change, when sourced from S3, would be to set S3 bucket versioning and set lambda zip version:

data "aws_s3_bucket_object" "lambda_zip" {
  bucket  = "bucket_name"
  key     = "lambda.zip"
}

resource "aws_lambda_function" "run_hll_lambda" {
  s3_bucket         = data.aws_s3_bucket_object.lambda_zip.bucket
  s3_key            = data.aws_s3_bucket_object.lambda_zip.key
  s3_object_version = data.aws_s3_bucket_object.lambda_zip.version_id
  function_name     = "Lambda_name"
  role              = aws_iam_role.lambda_iam.arn
  handler           = "lambda_function.lambda_handler"
  runtime           = "python3.7"
}

Dear all,

we do have the issue as described above. The code looks similar. Each time we run terraform apply the Lambda function is redeployed, even if nothing has changed. I have looked at the output of terraform and can confirm, that the hash of source_code_hash is not updated in the state file.

I am experiencing the same issue, specifically inside a CI/CD pipeline. It does not occur on OSX and it does not occur in Docker on OSX when the project directory is mounted from OSX.

resource "aws_lambda_function" "index" {
  filename      = "../lambda/index.zip"
  function_name = "${var.project}_${var.environment}_redirect2index"
  role          = "${aws_iam_role.iam_for_basic_permission.arn}"
  handler       = "index.handler"
  source_code_hash = filebase64sha256("../lambda/index.zip")
  runtime = "nodejs12.x"
  publish = true
  provider = "aws.east"
}

However, with the same Docker image, TF version, and AWS Provider version, the hashes in the CI pipeline never match. The one generated by filebase64sha256("../lambda/index.zip") match between runs, however, the ones stored in state are completely different each time.

I thought this was an issue of something else getting hashed, such as a timestamp or similar, but the generated hash is the same. Somehow, that hash that gets computed doesn’t get stored under source_code_hash.

This is actually quite a nasty problem because when the Lambda is used with CloudFront, the latter redeploys each time - because AWS thinks that a new version of the Lambda has been created. This then adds an additional at least 3, but often 10+ minutes to CD pipeline.

If someone is still running into this issue, a fix that worked for me was the etag in the aws_s3_object resource block.

This tag triggers an update on the zip file on deployment.

resource "aws_s3_bucket_object" "lambda" {
  bucket   = var.s3_bucket
  key      = "lambda.zip"
  etag     = filemd5("lambda.zip")
}

If you have the lambda resource block pointing to the right bucket and key, the lambda should get updated. I have tried several steps involving manually zipping up the lambdas and using the archive data block but on deployment it never detected the change even with source hash. etag on the aws_s3_bucket_object did the trick.

EDIT: Thanks to @heldersepu for pointing this out. If you are using KMS encryption source_hash will be a better alternative.

I’ve done a little digging into this issue as I recently encountered it.

In my use case I generate the zip files frequently, even if the underlying contents don’t change, the meta data changes in the zip file cause a different hash.

I tried to generate the hash of the contents outside of the zip and set it as the source code hash to get around this.

From my observations it appears that the source_code_hash field get’s set in the state file from the filename field regardless of the content supplied to it. ie: filebase64sha256(aws_lambda_function.func.filename).

My case what slightly different, but same effect: I am building a aws_lambda_layer_version resource. I was trying to have source_code_hash = filebase64sha256("poetry.lock"), because that was the only file that is there both build and deployment time. I wanted to be smart and make terraform skip deploying a new layer if the poetry.lock did not change.

Then I faced the same issue like described here (and several forum posts).

I also ended up storing the hash alongside the zip file in s3 during build time. I calculate the hash like @AGiantSquid suggests, using cat layer.zip | openssl dgst -binary -sha256 | openssl base64, and then uploaded it to s3 alongside the zip file.

In the deployment terraform code I tried to apply the following, somewhat ugly construct:

data "aws_s3_object" "layer_zip_hash" {
  bucket = "the-bucket"
  key    = "layer-common.zip.hash"
}

resource "aws_lambda_layer_version" "lambda_layer" {
  layer_name = "layer-common"

  s3_bucket = "the-bucket"
  s3_key    = "layer-common.zip"
  # Nothing else works other than the hash of the zip file.
  #  https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_layer_version#source_code_hash
  #   "Must be set to a base64-encoded SHA256 hash of the package file specified with either filename or s3_key"
  #  Earlier we tried to use the hash of the poetry.lock file, but that didn't work,
  #  although terraform plan did show a change in the source_code_hash, forcing recreation,
  #  in the end the source_code_hash was not recording the custom hash of the poetry.lock file.
  #   https://github.com/hashicorp/terraform-provider-aws/issues/7385
  #   see this comment especially: https://github.com/hashicorp/terraform-provider-aws/issues/7385#issuecomment-733995977
  #  See storing the hash as an object technique here:
  #  https://discuss.hashicorp.com/t/lambda-function-error-call-to-function-filebase64sha256-failed-no-file-exists/20233/4
  source_code_hash = data.aws_s3_object.layer_zip_hash.body
  skip_destroy     = false

  compatible_runtimes = ["python3.9"]
}

My terraform plan finally has stopped marking my layer to be replaced. The problem is, that it stopped for the wrong reasons. Now it does not update even if the actual zipped dependencies change! It really works, whenever the layer zip’s hash changes in s3, a new layer version will be produced. Make sure to upload the layer with content_type text/plain to avoid much frustration.

I have also looked into the fact that zipping twice the same content causes different hashes (no matter that the zips’ content are the same), so I am starting to get clueless.

I use the deterministic_zip pip package to make sure my zip’s effective content doesn’t change, yet allow myself to upload all the time a new zip file, like @joerggross, too. (https://github.com/bboe/deterministic_zip)


My 2 cents, and I admit, I was facepalming myself as well, when I found out, but look at the documentation: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function#source_code_hash

  • source_code_hash - (Optional) Used to trigger updates. Must be set to a base64-encoded SHA256 hash of the package file specified with either filename or s3_key. The usual way to set this is ${filebase64sha256("file.zip")} (Terraform 0.11.12 or later) or ${base64sha256(file("file.zip"))} (Terraform 0.11.11 and earlier), where “file.zip” is the local filename of the lambda layer source archive.

This is not a new entry (I went there and looked up in the git history, this piece of information was there for more than 4 years, https://github.com/hashicorp/terraform-provider-aws/commit/992d6978ce734d50124e3bed00c4022c106b3085). Even it being old enough so I can shame myself not noticing it during implementation, I totally align my opinion with @Miggleness about making this a little bit less prone to mess up.

I am not yet sure, though, how to ergonomically eliminate the hashing function there. Because I feel the source_code_hash parameter in its current form allows too much moving parts, causing all of us to naively drop in different calculations. Whereas the documentation clearly states that it must be set to a specific hash function of a specific file. An option could be the resource calculating the hash, lowering the level of abstraction of the parameter we provide to the source_package_file only, that can be a zip, jar, etc whatever we need to deploy.

Using a version id will not work for us, because we want to use snapshot-versions during development time, without always deploying and referencing a new version number.