rancher: [BUG] helm-operation failure - Waiting for Kubernetes API to be available
Rancher Server Setup
- Rancher version:
2.7.3
- Installation option (Docker install/Helm Chart):
Helm Chart
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc):
k3s 1.25.9
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc):
- Proxy/Cert Details:
self signed
Information about the Cluster v1.25.6
- Cluster Type (Local/Downstream):
Downstream
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider):
Custom: Running docker command to install RKE cluster
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider):
User Information
- What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom)
- If custom, define the set of permissions:
Admin
- If custom, define the set of permissions:
Describe the bug
Rancher keeps creating pods that fail
pod name:
helm-operation-ddxl9
container:
rancher/shell:v0.1.19
pod logs:
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Timeout waiting for kubernetes
pod is them terminated
Recent operations is filled with failures
To Reproduce
Do nothing, this just started happening with the upgrade to 2.7.2 and has persisted to 2.7.3
Result
Expected Result
Either not create the pod or that the pod can communicate with the kubernetes API
Screenshots
Additional context
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 54 (4 by maintainers)
Rancher is good, but it is a leaky abstraction. The assumption is you work with standard machines from cloud providers or enterprise procurement. Mine turns out to be within tigera calico operator that Rancher has made mostly opaque. https://docs.tigera.io/calico/latest/networking/ipam/ip-autodetection I have two NICs in some machines where a secondary NIC has a local storage network in a closed different subnet. Unfortunately, the default autodetection method of tigera calico operator is whichever NIC first seen, and those secondary NICs are what it saw. Hence some links are good while others are not, depending on where the node is located. This has nothing to do with what Matt says:
Or it has everything to do with what Rancher tries to achieve, to provide a mostly default automation K8s setup that works.
Another resource that might help is nicolaka/netshoot diagonisation tool, where you can make a deamon set to test connections and routes. Try dig-ging kubernetes.default in the container to test coreDNS, or ping-ing across locations to test pod vxlan and service ips.
for those of you experiencing this problem, check that your cluster’s k8s API is communicating properly.