kubernetes: Fluentd pod failed to become ready on master
Is this a BUG REPORT or FEATURE REQUEST?: /kind bug
What happened: Fluentd pod failed to become ready on master after upgrade from 1.8->HEAD.
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:260
Oct 27 16:30:05.440: Error waiting for all pods to be running and ready: 1 / 29 pods in namespace "kube-system" are NOT in RUNNING and READY state in 10m0s
POD NODE PHASE GRACE CONDITIONS
fluentd-gcp-v2.0.9-4pp8p bootstrap-e2e-master Running [{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-10-27 16:02:00 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-10-27 16:13:10 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [fluentd-gcp prometheus-to-sd-exporter]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-10-27 16:02:03 +0000 UTC Reason: Message:}]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:198
Multiple CIs are hitting this:
- https://k8s-testgrid.appspot.com/sig-release-master-upgrade#gce-1.8-master-upgrade-master
- https://k8s-testgrid.appspot.com/sig-network#gci-gce-latest-upgrade-kube-proxy-ds
- https://k8s-testgrid.appspot.com/sig-network#gci-gce-latest-downgrade-kube-proxy-ds
Anything else we need to know?: From the timeline, possibly caused by the fluentd hostnetwork change:
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 17 (12 by maintainers)
Thanks @dnardo for debugging this! @dnardo found that
net.ipv4.conf.all.route_localneton master node is 0, which should be 1.I set it to 1, and it does fix the issue.
The iptables rules are fine:
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-new-master-upgrade-master/1078
Same issue today. Again on master