kubernetes: Investigate node-kubelet-conformance test failures

Which jobs are failing:

node-kubelet-conformance suite

Which test(s) are failing:

The tests are failing in the BeforeSuite.

Since when has it been failing:

10/23/2020

Testgrid link:

https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-conformance

Reason for failure:

It started failing at commit 237dae5a5, but it’s unknown if it’s related to that.

Anything else we need to know:

We discussed in the Kubernetes SIG-Node CI subgroup and will start taking a look at it.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 28 (14 by maintainers)

Most upvoted comments

looking green!

I can repro the issue now, and I think the line was moved upward to generate the token before starting the kubelet. Moving it up out of the if-statement may help partially, but in the conformance test the kubelet is manually started ahead of this code, so it may not matter.

You’ll also need to use the --test-suite=conformance flag to make sure it runs in Docker, I believe.

quick interruption for instructions on how to reproduce this one: I think this may work

make test-e2e-node FOCUS="\[NodeConformance\]" SKIP="\[Flaky\]"  REMOTE=true DELETE_INSTANCES=true  IMAGE_CONFIG_FILE=node-test.yaml

Focus and skip The FOCUS and SKIP arguments im getting from https://github.com/kubernetes/test-infra/blob/48f2834380836b283d700833975a8162b392dfe4/config/jobs/kubernetes/sig-node/node-kubelet.yaml#L95

Image config file The job also uses the following config file https://github.com/kubernetes/test-infra/blob/48f2834380836b283d700833975a8162b392dfe4/config/jobs/kubernetes/sig-node/node-kubelet.yaml#L91 https://github.com/kubernetes/test-infra/blob/master/jobs/e2e_node/image-config.yaml Make sure to copy it and name it node-test.yaml (the argument to IMAGE_CONFIG_FILE above).

Additional arguments The test args to this job are https://github.com/kubernetes/test-infra/blob/48f2834380836b283d700833975a8162b392dfe4/config/jobs/kubernetes/sig-node/node-kubelet.yaml#L92

I think these are the defaults already but just in case may be actually useful to add the following to the above command

TEST_ARGS='--feature-gates=DynamicKubeletConfig=true --kubelet-flags="--cgroups-per-qos=true --cgroup-root=/"'

Miscellaneous If you run into trouble with the above commands try going through https://github.com/contributing-to-kubernetes/gnosis/tree/master/stories/e2e-node-tests

The info here came from https://github.com/kubernetes/community/blob/master/contributors/devel/sig-node/e2e-node-tests.md

Catching up on this. qq: has anyone vetted https://github.com/kubernetes/kubernetes/pull/94723/files ? Ise that pr went in around the time this job began to fail https://github.com/kubernetes/kubernetes/compare/1fcd02cc2...237dae5a5

The beforesuite (AI: we should list out what it does), fails to communicate with nodes

I1023 20:03:09.965] Failure [15.352 seconds]
I1023 20:03:09.965] [BeforeSuite] BeforeSuite 
I1023 20:03:09.966] _output/local/go/src/k8s.io/kubernetes/test/e2e_node/e2e_node_suite_test.go:177
I1023 20:03:09.966] 
I1023 20:03:09.966]   should be able to list nodes.
      
I1023 20:03:09.966]   Unexpected error:
I1023 20:03:09.966]       <*url.Error | 0xc000b0a570>: {
I1023 20:03:09.966]           Op: "Get",
I1023 20:03:09.966]           URL: "https://127.0.0.1:6443/api/v1/nodes",
I1023 20:03:09.966]           Err: {
I1023 20:03:09.967]               Op: "dial",
I1023 20:03:09.967]               Net: "tcp",
I1023 20:03:09.967]               Source: nil,
I1023 20:03:09.967]               Addr: {
I1023 20:03:09.967]                   IP: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 127, 0, 0, 1],
I1023 20:03:09.967]                   Port: 6443,
I1023 20:03:09.967]                   Zone: "",
I1023 20:03:09.967]               },
I1023 20:03:09.968]               Err: {Syscall: "connect", Err: 0x6f},
I1023 20:03:09.968]           },
I1023 20:03:09.968]       }
I1023 20:03:09.968]       Get "https://127.0.0.1:6443/api/v1/nodes": dial tcp 127.0.0.1:6443: connect: connection refused
I1023 20:03:09.968]   occurred
I1023 20:03:09.968] 
I1023 20:03:09.968]   _output/local/go/src/k8s.io/kubernetes/test/e2e_node/e2e_node_suite_test.go:315

the mention of 6443 ports looks interesting here.