kubernetes: DualStack: Fail to deploy dualstack cluster, kube-proxy panics

What happened: I deployed a dualstack cluster with a config file. First, kube-controller-manager CrashLoopBackOff, because it add a default option --node-cidr-mask-size=24, I deleted it from /etc/kubernetes/manifests/kube-controller-manager.yaml, I think in dualstack mode, kube-controller-manager should ignore the --node-cidr-mask-size. Then, kube-proxy CrashLoopBackOff, [root@master ~]# kubectl logs -f kube-proxy-jpnl6 -n kube-system I0102 09:57:44.553192 1 node.go:135] Successfully retrieved node IP: 172.18.130.251 I0102 09:57:44.553270 1 server_others.go:172] Using ipvs Proxier. I0102 09:57:44.553287 1 server_others.go:174] creating dualStackProxier for ipvs. W0102 09:57:44.555671 1 proxier.go:420] IPVS scheduler not specified, use rr by default W0102 09:57:44.556213 1 proxier.go:420] IPVS scheduler not specified, use rr by default W0102 09:57:44.556278 1 ipset.go:107] ipset name truncated; [KUBE-6-LOAD-BALANCER-SOURCE-CIDR] -> [KUBE-6-LOAD-BALANCER-SOURCE-CID] W0102 09:57:44.556303 1 ipset.go:107] ipset name truncated; [KUBE-6-NODE-PORT-LOCAL-SCTP-HASH] -> [KUBE-6-NODE-PORT-LOCAL-SCTP-HAS] I0102 09:57:44.556606 1 server.go:571] Version: v1.17.0 I0102 09:57:44.557622 1 config.go:313] Starting service config controller I0102 09:57:44.557654 1 shared_informer.go:197] Waiting for caches to sync for service config I0102 09:57:44.557717 1 config.go:131] Starting endpoints config controller I0102 09:57:44.557753 1 shared_informer.go:197] Waiting for caches to sync for endpoints config W0102 09:57:44.560310 1 meta_proxier.go:106] failed to add endpoints kube-system/kube-scheduler with error failed to identify ipfamily for endpoints (no subsets) W0102 09:57:44.560337 1 meta_proxier.go:106] failed to add endpoints kube-system/kube-dns with error failed to identify ipfamily for endpoints (no subsets) W0102 09:57:44.560428 1 meta_proxier.go:106] failed to add endpoints kube-system/kube-controller-manager with error failed to identify ipfamily for endpoints (no subsets) E0102 09:57:44.560646 1 runtime.go:78] Observed a panic: “invalid memory address or nil pointer dereference” (runtime error: invalid memory address or nil pointer dereference) goroutine 29 [running]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1682120, 0x27f9a40) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3 k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82 panic(0x1682120, 0x27f9a40) /usr/local/go/src/runtime/panic.go:679 +0x1b2 k8s.io/kubernetes/pkg/proxy/ipvs.(*metaProxier).OnServiceAdd(0xc0003ba330, 0xc0001c3200) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/proxy/ipvs/meta_proxier.go:65 +0x2b k8s.io/kubernetes/pkg/proxy/config.(*ServiceConfig).handleAddService(0xc0003352c0, 0x1869ac0, 0xc0001c3200) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/proxy/config/config.go:333 +0x82 k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(…) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/controller.go:198 k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0xf, 0xc00031a1c0, 0x0) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:658 +0x218 k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0, 0xc000594dd8, 0xc000557610, 0xf) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:292 +0x51 k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1() /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:652 +0x79 k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00046b740) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x5e k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000594f40, 0xdf8475800, 0x0, 0xc000686601, 0xc00009a240) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8 k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until(…) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*processorListener).run(0xc000478100) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:650 +0x9b k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc0003be840, 0xc000428580) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x59 created by k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x14be59b]

goroutine 29 [running]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x105 panic(0x1682120, 0x27f9a40) /usr/local/go/src/runtime/panic.go:679 +0x1b2 k8s.io/kubernetes/pkg/proxy/ipvs.(*metaProxier).OnServiceAdd(0xc0003ba330, 0xc0001c3200) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/proxy/ipvs/meta_proxier.go:65 +0x2b k8s.io/kubernetes/pkg/proxy/config.(*ServiceConfig).handleAddService(0xc0003352c0, 0x1869ac0, 0xc0001c3200) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/proxy/config/config.go:333 +0x82 k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(…) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/controller.go:198 k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0xf, 0xc00031a1c0, 0x0) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:658 +0x218 k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0, 0xc000594dd8, 0xc000557610, 0xf) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:292 +0x51 k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1() /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:652 +0x79 k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00046b740) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x5e k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000594f40, 0xdf8475800, 0x0, 0xc000686601, 0xc00009a240) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8 k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until(…) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*processorListener).run(0xc000478100) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:650 +0x9b k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc0003be840, 0xc000428580) /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x59 created by k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start /workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): Client Version: version.Info{Major:“1”, Minor:“17”, GitVersion:“v1.17.0”, GitCommit:“70132b0f130acc0bed193d9ba59dd186f0e634cf”, GitTreeState:“clean”, BuildDate:“2019-12-07T21:20:10Z”, GoVersion:“go1.13.4”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“17”, GitVersion:“v1.17.0”, GitCommit:“70132b0f130acc0bed193d9ba59dd186f0e634cf”, GitTreeState:“clean”, BuildDate:“2019-12-07T21:12:17Z”, GoVersion:“go1.13.4”, Compiler:“gc”, Platform:“linux/amd64”}
  • OS (e.g: cat /etc/os-release): CentOS Linux release 7.7.1908 (Core)
  • Kernel (e.g. uname -a): Linux master 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • kubeadm init config file: kubeadm-conf.txt

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 4
  • Comments: 46 (31 by maintainers)

Most upvoted comments

@aojea

maybe if we want to migrate a cluster from single-stack to dual-stack?

To migrate ipv4->dual-stack can be reduced to enable dual-stack in k8s >=v1.17.0. The upgrade of a ipv4 cluster to >=v1.17.0 must work, so that is a no-issue. Once you are on k8s >=v1.17.0 I think the best way is to first enable dual-stack on the master(s), updating CIDRs etc, and let the workers stay with IPv6DualStack:false. Then re-boot them with IPv6DualStack:true one-by-one.

Then the case is the reverse as commented above https://github.com/kubernetes/kubernetes/issues/86773#issuecomment-570521112.

But this has to be discussed some place else 😃

The reason for the panic is not hard to see;

https://github.com/kubernetes/kubernetes/blob/65ef5dcc513ccfd60436bf4d04652224c9b6036f/pkg/proxy/ipvs/meta_proxier.go#L64-L66

There is no check for nil.

The reason why IPFamily is nil is less clear. I tried to set IPv6DualStack:false for the “master” K8s processes, but keep IPv6DualStack:true on kube-proxy and I get exactly the panic described in this issue.

So I think the problem is cluster misconfiguration.

I am unsure if the panic is acceptable. The error indication could be better of course but IMHO the kube-proxy shall not “help” the user in this case by make some assumption of ipv4 for instance. That would hide a serious misconfiguration.

Same error. kube-proxy is v1.17.0. Using mode:ipvs for dual-stack

[root@henry-dual-we-01 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:12:12Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.17.0
featureGates:
  IPv6DualStack: true
controllerManager:
  extraArgs:
    external-cloud-volume-plugin: openstack
    cluster-cidr: 192.168.0.0/22,2001:283:4000:2002::/62
    service-cluster-ip-range: 10.253.0.0/16,fd01:abce::/112
networking:
  serviceSubnet: 10.253.0.0/16,fd01:abce::/112
  podSubnet: 192.168.0.0/22,2001:283:4000:2002::/62
  dnsDomain: "cluster.local"
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
clusterCIDR: 192.168.0.0/22,2001:283:4000:2002::/62
featureGates:
  SupportIPVSProxyMode: true
  IPv6DualStack: true
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
  IPv6DualStack: true

Refer to the validation guide (https://kubernetes.io/docs/tasks/network/validate-dual-stack/#validate-pod-addressing), pods, nodes and services work well, but there are the same error in kube-proxy logs

[root@henry-dual-we-01 ~]# kubectl logs -f -n kube-system   kube-proxy-k8cq5
W0304 03:42:56.847495       1 feature_gate.go:235] Setting GA feature gate SupportIPVSProxyMode=true. It will be removed in a future release.
I0304 03:43:47.472881       1 node.go:135] Successfully retrieved node IP: 10.75.72.170
I0304 03:43:47.473227       1 server_others.go:172] Using ipvs Proxier.
I0304 03:43:47.473273       1 server_others.go:174] creating dualStackProxier for ipvs.
W0304 03:43:47.485272       1 proxier.go:420] IPVS scheduler not specified, use rr by default
W0304 03:43:47.485643       1 proxier.go:420] IPVS scheduler not specified, use rr by default
W0304 03:43:47.485701       1 ipset.go:107] ipset name truncated; [KUBE-6-LOAD-BALANCER-SOURCE-CIDR] -> [KUBE-6-LOAD-BALANCER-SOURCE-CID]
W0304 03:43:47.485730       1 ipset.go:107] ipset name truncated; [KUBE-6-NODE-PORT-LOCAL-SCTP-HASH] -> [KUBE-6-NODE-PORT-LOCAL-SCTP-HAS]
I0304 03:43:47.507612       1 server.go:571] Version: v1.17.0
I0304 03:43:47.564028       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0304 03:43:47.564068       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0304 03:43:47.565970       1 conntrack.go:83] Setting conntrack hashsize to 32768
I0304 03:43:47.575072       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0304 03:43:47.578432       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0304 03:43:47.585859       1 config.go:313] Starting service config controller
I0304 03:43:47.605239       1 shared_informer.go:197] Waiting for caches to sync for service config
I0304 03:43:47.606916       1 config.go:131] Starting endpoints config controller
I0304 03:43:47.606959       1 shared_informer.go:197] Waiting for caches to sync for endpoints config
W0304 03:43:47.719335       1 meta_proxier.go:106] failed to add endpoints kube-system/kube-scheduler with error failed to identify ipfamily for endpoints (no subsets)
W0304 03:43:47.719645       1 meta_proxier.go:106] failed to add endpoints default/my-service-default with error failed to identify ipfamily for endpoints (no subsets)
W0304 03:43:47.719768       1 meta_proxier.go:106] failed to add endpoints default/my-service-ipv6 with error failed to identify ipfamily for endpoints (no subsets)

Want to know if there is some way to specify the ipfamily? Is this error caused by “W0304 03:43:47.485272 1 proxier.go:420] IPVS scheduler not specified, use rr by default”?

[root@henry-dual-we-01 ~]# kubectl get endpoints -n kube-system
NAME                       ENDPOINTS                                                              AGE
cloud-controller-manager   <none>                                                                 5d16h
kube-controller-manager    <none>                                                                 5d17h
kube-dns                   192.168.178.198:53,192.168.178.199:53,192.168.178.198:53 + 3 more...   5d17h
kube-scheduler             <none>                                                                 5d17h

@Richard87

Hi, so the error is because ipFamily is not set on a service, but what should the ipFamily be on a headless service?

that’s the 1M dollar question 😉 https://github.com/kubernetes/kubernetes/pull/86895

seems we are getting closer to solve this

Because it is a configuration error. Since the user has enabled the feature-gate half-way he/she expects dual-stack to work, but it can’t. If this faulty configuration is just accepted this issue will be the first in an endless stream of duplicates.

An unspecified family will be set to the “main” family of the cluster (which may be ipv6) by the master processes (api-server?) when the feature-gate is enabled which ensures backward compatibility. But the decision which family is made by the master, not kube-proxy.

/workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/proxy/ipvs/meta_proxier.go:65

you seem to be using a pre-release version of kube-proxy (v1.17-rc.2.10+70132b0f130acc). try v1.17.0.

if v1.17.0 also does not work, try using mode: iptables instead of IPVS.

@kubernetes/sig-network-bugs