cilium: CI: Travis: ENISuite.TestNodeManagerManyNodes fails

CI failure

level=info msg="Synchronized ENI information" numInstances=100 numSecurityGroups=2 numSubnets=4 numVPCs=1 subsys=eni

----------------------------------------------------------------------

FAIL: node_manager_test.go:563: ENISuite.TestNodeManagerManyNodes
node_manager_test.go:602:

    c.Errorf("Node %s allocation mismatch. expected: %d allocated: %d", s.name, minAllocate, node.Stats().AvailableIPs)

... Error: Node node53 allocation mismatch. expected: 10 allocated: 18

node_manager_test.go:602:

    c.Errorf("Node %s allocation mismatch. expected: %d allocated: %d", s.name, minAllocate, node.Stats().AvailableIPs)

... Error: Node node59 allocation mismatch. expected: 10 allocated: 18

node_manager_test.go:617:
    c.Assert(metricsapi.AllocatedIPs("available"), check.Equals, numNodes*minAllocate)

... obtained int = 1016
... expected int = 1000

...
OOPS: 17 passed, 1 FAILED
--- FAIL: Test (2.60s)
FAIL
coverage: 4.3% of statements in ./...

FAIL	github.com/cilium/cilium/pkg/aws/eni	2.684s
FAIL

Makefile:204: recipe for target 'unit-tests' failed
make: *** [unit-tests] Error 1
The command "./.travis/build.sh" exited with 2.

Done. Your build exited with 1.

https://travis-ci.com/github/cilium/cilium/jobs/335282883

Seen in https://github.com/cilium/cilium/pull/11555

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 23 (22 by maintainers)

Commits related to this issue

Most upvoted comments

FWIW:

$ find . -name node_manager_test.go
./pkg/ipam/node_manager_test.go
./pkg/aws/eni/node_manager_test.go

I tried running these locally and they both passed:

$ make unit-tests TESTPKGS=pkg/aws/eni
$ make unit-tests TESTPKGS=pkg/ipam

I’m guessing that there’s either an angle here of raciness or an angle of a prior test not cleaning up properly after itself.

Hmm, there’s definitely some raciness going on. I ran the tests with -race enabled, and it found many, many data races. A decent amount of data races are in the TestNodeManagerManyNodes so this might explain this Travis failure. I’m working on a PR to fix some data races under just TestNodeManagerManyNodes, as there are many data races around this code.

Removing @christarazi 's stale assignment here. If you plan on working on this, feel free to reassign it back to yourself.

@gandro I restarted the Travis CI build where it happened. I’ll keep it next time and ping here.