rancher: Canal readiness probe failed with statuscode 503 for k8s 1.15
What kind of request is this (question/bug/enhancement/feature request): Bug
Steps to reproduce (least amount of steps as possible): Create aws cluster with default options in Rancher 2.3.0
Result: Once the cluster has been created, navigate to the System project, then click on Canal under the kube-system namespace. Click on one of the canal pods then open up the Events tab. Observe the Unhealthy warning for the readiness probe. Screenshot pasted below of this behavior.
Logs from the calico-node container:
2019-10-14 18:44:22.732 [INFO][8] startup.go 256: Early log level set to info
2019-10-14 18:44:22.732 [INFO][8] startup.go 272: Using NODENAME environment for node name
2019-10-14 18:44:22.732 [INFO][8] startup.go 284: Determined node name: aws3
2019-10-14 18:44:22.734 [INFO][8] k8s.go 228: Using Calico IPAM
2019-10-14 18:44:22.734 [INFO][8] startup.go 316: Checking datastore connection
2019-10-14 18:44:22.744 [INFO][8] startup.go 340: Datastore connection verified
2019-10-14 18:44:22.744 [INFO][8] startup.go 95: Datastore is ready
2019-10-14 18:44:22.772 [INFO][8] startup.go 530: FELIX_IPV6SUPPORT is false through environment variable
2019-10-14 18:44:22.778 [INFO][8] startup.go 181: Using node name: aws3
2019-10-14 18:44:22.809 [INFO][15] k8s.go 228: Using Calico IPAM
CALICO_NETWORKING_BACKEND is none - no BGP daemon running
Calico node started successfully
2019-10-14 18:44:24.038 [WARNING][33] int_dataplane.go 354: Failed to query VXLAN device error=Link not found
2019-10-14 18:44:24.074 [WARNING][33] int_dataplane.go 384: Failed to cleanup preexisting XDP state error=failed to load XDP program (/tmp/felix-xdp-563577510): stat /sys/fs/bpf/calico/xdp/prefilter_v1_calico_tmp_A: no such file or directory
libbpf: failed to get EHDR from /tmp/felix-xdp-563577510
Error: failed to open object file
2019-10-14 18:44:44.432 [WARNING][33] health.go 190: Reporter failed readiness checks name="async_calc_graph" reporter-state=&health.reporterState{name:"async_calc_graph", reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf614e671242a664, ext:20426272735, loc:(*time.Location)(0x2b08080)}}
2019-10-14 18:45:10.016 [WARNING][33] health.go 190: Reporter failed readiness checks name="async_calc_graph" reporter-state=&health.reporterState{name:"async_calc_graph", reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf614e6d698916e9, ext:45816765016, loc:(*time.Location)(0x2b08080)}}
2019-10-14 18:47:04.432 [WARNING][33] health.go 190: Reporter failed readiness checks name="async_calc_graph" reporter-state=&health.reporterState{name:"async_calc_graph", reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf614e8a10a0555d, ext:160398857927, loc:(*time.Location)(0x2b08080)}}
2019-10-14 18:47:24.432 [WARNING][33] health.go 190: Reporter failed readiness checks name="async_calc_graph" reporter-state=&health.reporterState{name:"async_calc_graph", reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf614e8f04b5ad88, ext:180198930154, loc:(*time.Location)(0x2b08080)}}
2019-10-14 18:47:50.016 [WARNING][33] health.go 190: Reporter failed readiness checks name="async_calc_graph" reporter-state=&health.reporterState{name:"async_calc_graph", reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf614e9568781150, ext:205798872324, loc:(*time.Location)(0x2b08080)}}
2019-10-14 18:49:44.432 [WARNING][33] health.go 190: Reporter failed readiness checks name="async_calc_graph" reporter-state=&health.reporterState{name:"async_calc_graph", reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf614eb2118552ef, ext:320413865069, loc:(*time.Location)(0x2b08080)}}
2019-10-14 18:50:04.432 [WARNING][33] health.go 190: Reporter failed readiness checks name="async_calc_graph" reporter-state=&health.reporterState{name:"async_calc_graph", reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf614eb704b6187f, ext:340198957640, loc:(*time.Location)(0x2b08080)}}
2019-10-14 18:50:30.017 [WARNING][33] health.go 190: Reporter failed readiness checks name="async_calc_graph" reporter-state=&health.reporterState{name:"async_calc_graph", reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf614ebd687a5ed0, ext:365799023174, loc:(*time.Location)(0x2b08080)}}
2019-10-14 18:52:24.432 [WARNING][33] health.go 190: Reporter failed readiness checks name="async_calc_graph" reporter-state=&health.reporterState{name:"async_calc_graph", reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf614eda118b4aee, ext:480414256219, loc:(*time.Location)(0x2b08080)}}
2019-10-14 18:52:44.432 [WARNING][33] health.go 190: Reporter failed readiness checks name="async_calc_graph" reporter-state=&health.reporterState{name:"async_calc_graph", reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf614edf04b5d5b4, ext:500198940515, loc:(*time.Location)(0x2b08080)}}
2019-10-14 18:53:10.016 [WARNING][33] health.go 190: Reporter failed readiness checks name="async_calc_graph" reporter-state=&health.reporterState{name:"async_calc_graph", reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf614ee568774cab, ext:525798821911, loc:(*time.Location)(0x2b08080)}}
2019-10-14 18:55:04.432 [WARNING][33] health.go 190: Reporter failed readiness checks name="async_calc_graph" reporter-state=&health.reporterState{name:"async_calc_graph", reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf614f0210ad7f44, ext:640399720623, loc:(*time.Location)(0x2b08080)}}
2019-10-14 18:55:24.432 [WARNING][33] health.go 190: Reporter failed readiness checks name="async_calc_graph" reporter-state=&health.reporterState{name:"async_calc_graph", reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf614f0705adeec1, ext:660215199869, loc:(*time.Location)(0x2b08080)}}
2019-10-14 18:55:50.017 [WARNING][33] health.go 190: Reporter failed readiness checks name="async_calc_graph" reporter-state=&health.reporterState{name:"async_calc_graph", reports:health.HealthReport{Live:true, Ready:true}, timeout:20000000000, latest:health.HealthReport{Live:true, Ready:false}, timestamp:time.Time{wall:0xbf614f0d687ee33d, ext:685799319221, loc:(*time.Location)(0x2b08080)}}
Other details that may be helpful: This does not happen in Rancher v2.2.8
Environment information
- Rancher version (
rancher/rancher
/rancher/server
image tag or shown bottom left in the UI): Rancher v2.3.0 - Installation option (single install/HA): HA
Cluster information
- Cluster type (Hosted/Infrastructure Provider/Custom/Imported): AWS
- Machine type (cloud/VM/metal) and specifications (CPU/memory): VM, t2.large
- Kubernetes version (use
kubectl version
): 1.14.6
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 17 (8 by maintainers)
Hi Guys,
I just installed a new cluster with canal via rke and ran into this exact same bug using “latest” helm chart.
I just tried the “solution” which @superseb has kindly shared and can confirm this resolves the issue.
I have pasted the events log below for confirmation.
The fix for this is patching the
canal
DaemonSet and creating the NetworkSet CRD (this will recreate Canal pods), this is use at your own risk until it gets verified and released (test envs only):Save the following as
crd.yml
and runkubectl create -f crd.yml
in the cluster:Please let me know if this solves the issue while we investigate further:
Save the following in
crd.yml
and executekubectl create -f crd.yml
in the cluster