kubernetes: Leaking node-healthcheck firewall rule

A couple of gce-scale-performance runs failed due to failing to delete the network, which in turn was due to a couple of leaked firewall-rules (I manually deleted the network later to fix it):

W0912 14:34:23.820] ERROR: (gcloud.compute.networks.delete) Could not fetch resource:
W0912 14:34:23.820]  - The network resource 'projects/kubernetes-scale/global/networks/gce-scale-cluster' is already being used by 'projects/kubernetes-scale/global/firewalls/k8s-fw-a8986c438978c11e7bc7a42010a8e000'
W0912 14:34:23.820] 
I0912 14:34:23.921] Failed to delete network 'gce-scale-cluster'. Listing firewall-rules:
I0912 14:34:24.434] NAME                                     NETWORK            SRC_RANGES                                                    RULES      SRC_TAGS  TARGET_TAGS
I0912 14:34:24.434] k8s-2dd9fdb99ae3b676-node-http-hc        gce-scale-cluster  130.211.0.0/22,35.191.0.0/16,209.85.152.0/22,209.85.204.0/22  tcp:10256            gce-scale-cluster-minion
I0912 14:34:24.434] k8s-fw-a8986c438978c11e7bc7a42010a8e000  gce-scale-cluster  0.0.0.0/0 

Ref: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/31

cc @kubernetes/sig-network-bugs @bowei (assigning to you, feel free to reassign as apt)

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 19 (19 by maintainers)

Commits related to this issue

Most upvoted comments

Closing this issue as we’re now deleting all remaining firewall rules in the network before deleting the network itself in kube-down. For other networking resources in general, we have issue #54323.

For the time-being I manually deleted the subnets and then the network to unblock the next scale job run. But it would be good to have this fixed soon.

cc @kubernetes/sig-scalability-misc