tfsec: bug: non-deterministic behavior/results due to for_each processing for v1+ releases

Describe the bug Some combination of tf resources yields inconsistent results.

To Reproduce Steps to reproduce the behavior:

  1. Create directory the files from this gist: https://gist.github.com/BryanStenson-okta/caf244cc5ffbf25a590f4a3fc5d7ae51
  2. Run tfsec . repeatedly (sometimes at least 10-20 times), and observe different results.

Expected behavior Each execution of tfsec, on the identical codebase, should yield identical results.

Screenshots/Output

System Info

  • tfsec version: v1.0.11
  • terraform version: v1.1.3
  • OS: osx

Example Code

https://gist.github.com/BryanStenson-okta/caf244cc5ffbf25a590f4a3fc5d7ae51

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 20 (9 by maintainers)

Most upvoted comments

Thanks for the quick turnaround! I validated this on all our use cases and everything is consistent.

This is awesome news! Thanks @davidmlee-okta for the excellently simple repro, and @liamg for your tenacity on sticking with my original report.

Very excited for this.

So thanks entirely to your help, I have found the issue.

We have a map[string]string containing our mappings from legacy -> new IDs. We invert the map in order to map new IDs back to legacy ones to match against legacy style ignore rules. Unfortunately, unlike the keys, the values are not unique, as some checks have been merged, and therefore have multiple legacy IDs for a single new ID. Since maps are unordered, it is unpredictable which legacy ID will get overwritten when we invert the map. So we are sometimes matching the legacy ID in your ignore rule, sometimes not.

I’ll have a fix out later this afternoon.

I found a distilled example and have pinpointed the issue to legacy id support:

With legacy ID format using this code

resource "aws_security_group" "tfsec test" {
  name        = "TFSec Test"
  description = "Allows internal traffic for tfsec testing"
  vpc_id      = module.aws_vpc_info.vpc.id

  #tfsec:ignore:AWS008
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    description = "tfsec ingress test"
    cidr_blocks = ["0.0.0.0/0"] 
  }
  
  #tfsec:ignore:AWS009
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    description = "tfsec egress test"
    cidr_blocks = ["0.0.0.0/0"] 
  }
}

TFSec v1.8.0 produces this inconsistent result

~/o/tfsec-test ➜ for i in {1..10}; do
 tfsec . -f json 2>/dev/null | jq -r '.results | length'
done
1
1
2
2
1
0
0
1
1
1

Changing the ignore IDs to new IDs like this

resource "aws_security_group" "tfsec test" {
  name        = "TFSec Test"
  description = "Allows internal traffic for tfsec testing"
  vpc_id      = module.aws_vpc_info.vpc.id

  #tfsec:ignore:aws-vpc-no-public-ingress-sgr
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    description = "tfsec ingress test"
    cidr_blocks = ["0.0.0.0/0"] 
  }
  
  #tfsec:ignore:aws-vpc-no-public-egress-sgr
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    description = "tfsec egress test"
    cidr_blocks = ["0.0.0.0/0"] 
  }
}

We get this result

~/o/tfsec-test ➜ for i in {1..10}; do
 tfsec . -f json 2>/dev/null | jq -r '.results | length'
done
0
0
0
0
0
0
0
0
0
0

Using v1.2.1, I’m still seeing inconsistent results:

$ ./tfsec-darwin-amd64.v1.2.1 -v                              
v1.2.1
$ ./tfsec-darwin-amd64.v1.2.1 -f json | jq '.results | length'
24
$ ./tfsec-darwin-amd64.v1.2.1 -f json | jq '.results | length'
25
$ ./tfsec-darwin-amd64.v1.2.1 -f json | jq '.results | length'
23
$ ./tfsec-darwin-amd64.v1.2.1 -f json | jq '.results | length'
25

I’m not 100% sure the directory I tested this on before, but I picked another random one from our codebase, and I’m still seeing inconsistencies.

I haven’t had time to reduce this is a “safe-to-share” example in a gist. Perhaps I’ll have more time to provide that next week.

Thanks @liamg

tl;dr - v1.1.1 solves the case exemplified in the gist, but there is more non-determinism.

I’ve verified v1.1.1 works on the files I’ve provided in the gist.

However, I’m still seeing non-determinism on my codebase. I’m sanitizing my code to provide another reduced/cleaned example in another gist:

$ ~/Downloads/tfsec-darwin-amd64 --version
v1.1.1
$ ~/Downloads/tfsec-darwin-amd64 . -f json 2>/dev/null | jq -r '.results | length'
39
$ ~/Downloads/tfsec-darwin-amd64 . -f json 2>/dev/null | jq -r '.results | length'
41
$ ~/Downloads/tfsec-darwin-amd64 . -f json 2>/dev/null | jq -r '.results | length'
40
$ ~/Downloads/tfsec-darwin-amd64 . -f json 2>/dev/null | jq -r '.results | length'
41

Please re-open this ticket.

i’ve verified this all works as expected when running tfsec v0.63.1