kubernetes: Kubernetes API server stuck on metrics server API service discovery check failure
What happened:
Occasionally, the metrics server API service v1beta1.metrics.k8s.io
fails discovery after cluster creation. The discovery checks often continue failing until the API service is deleted and recreated. Sometimes, discovery will eventually succeed without intervention but this could take 30 or more minutes. This problem primarily impacts Kubernetes v1.15, but has been seen at least once on Kubernetes v1.16. In addition to recreating the API service to resolve the problem, adding a liveness and readiness probe to https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/metrics-server/metrics-server-deployment.yaml usually prevents the discovery failure loop.
Example v1beta1.metrics.k8s.io
API service status:
status:
conditions:
- lastTransitionTime: "2019-09-28T20:11:14Z"
message: 'failing or missing response from https://172.21.13.79:443/apis/metrics.k8s.io/v1beta1:
Get https://172.21.13.79:443/apis/metrics.k8s.io/v1beta1: net/http: request
canceled (Client.Timeout exceeded while awaiting headers)'
reason: FailedDiscoveryCheck
status: "False"
type: Available
See comments for example Kubernetes API server and metric server pod log error messages during discovery failure loop.
What you expected to happen:
Metrics server API service v1beta1.metrics.k8s.io
discovery succeeds after cluster creation.
How to reproduce it (as minimally and precisely as possible):
Reproducing the problem is not easy since it appears to be timing related with respect to API server discovery of metrics server API service.
Anything else we need to know?:
No.
Environment:
- Kubernetes version (use
kubectl version
): Primarily v1.15.x, although seen once on v1.16.0 - Cloud provider or hardware configuration: IBM Cloud Kubernetes Service
- OS (e.g:
cat /etc/os-release
): Ubuntu 18.04.3 LTS - Kernel (e.g.
uname -a
): 4.15.0-64-generic - Install tools: Cloud provider managed service install tools
- Network plugin and version (if this is a network-related bug): Calico v3.8.2
- Others: None.
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 14
- Comments: 47 (15 by maintainers)
@rtheis I also encountered the same problem, did you solve it?
Can you help me??? I‘d really appreciate it.
I have the same problem with k8s version 1.18.10,ubuntu 18.04. I checked the apiservice and the result is below:
please help me ,thanks