istio: istio-cni does not return an error when it cannot get pod information from k8s
Bug description
If istio-cni is unable to get pod information, it logs a warning, finds that the number of containers for the pod is not greater than 1, skips injection and succeeds.
It needs to fail fast in this case.
What happened in our case was that we had the cni plugin run as a daemonset in the istio-system
namespace. Due to races on new node creation, we had to move it to kube-system
so that we could set its priority class to system-node-critical
.
In this move we forgot to update the cluster role binding to the point to the new service account in the kube-system
namespace. As a result, the CNI pod did not have privileges to get pod info from the k8s API.
In such a situation, I would expect the CNI plugin to fail fast and block pod creation, as opposed to what really happened which was that a bunch of pods came up with no IPTables rules injected and caused a massive, hard-to-debug, cluster-wide outage because no traffic was being redirected to the envoy proxies.
The fact that we were rolling nodes which causes a bunch of new pods to be created at the same time made the outage worse. OTOH, it was a useful signal to figure out the root cause of a problem that would have been otherwise hard to debug.
Affected product area (please put an X in all that apply)
[ ] Configuration Infrastructure [ ] Docs [ ] Installation [X ] Networking [ ] Performance and Scalability [ ] Policies and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 15 (13 by maintainers)
@ruigulala Istio 1.4 only supports Kubernetes 1.13+ so I think its reasonable ot assume its enabmed
I have no idea why the namespace was changed. In 1.3, the daemon must be created in
kube-system
due to itspriorityClass
. @ruigulala changed the priority class, and it should not require being created inkube-system
anymore. We’re lucky this was done before 1.4 release then.