digitalocean-cloud-controller-manager: Digital Ocean implementation of K8's does not allow metrics-server to function
Hello,
I raised an issue last night for this, but since I’ve had more time to sleep, wanted to raise it again and provide some more information.
I currently have a Digital Ocean Managed Kubernetes Cluster. I have some applications deployed and running on it.
I have configured Horizontal Pod Autoscaling for one of my deployments, but when running the kubectl get hpa command, I noticed the following in my output (<unknown> in the targets column):
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
benjamin-maynard-io-fe Deployment/benjamin-maynard-io-fe <unknown>/80% 1 20 3 10h
I identified this was because I did not have either heapster or metrics-server running on my cluster. So went to install it as per the instructions on https://github.com/kubernetes-incubator/metrics-server
metrics-server successfully installs, and is running in the kube-system namespace:
NAME READY STATUS RESTARTS AGE
csi-do-controller-0 3/3 Running 0 42h
csi-do-node-dbvg5 2/2 Running 0 42h
csi-do-node-lq97x 2/2 Running 1 42h
csi-do-node-mvnrw 2/2 Running 0 42h
kube-dns-55cf9576c4-4r466 3/3 Running 0 42h
kube-proxy-upbeat-lichterman-3mz4 1/1 Running 0 42h
kube-proxy-upbeat-lichterman-3mzh 1/1 Running 0 42h
kube-proxy-upbeat-lichterman-3mzi 1/1 Running 0 42h
metrics-server-7fbd9b8589-64x86 1/1 Running 0 9m48s
However, I am still getting no metrics.
Running kubectl get apiservice v1beta1.metrics.k8s.io -o yaml reveals:
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
creationTimestamp: 2018-11-27T08:32:26Z
name: v1beta1.metrics.k8s.io
resourceVersion: "396557"
selfLink: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
uid: f88f3576-f21e-11e8-8aed-fab39051e242
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100
status:
conditions:
- lastTransitionTime: 2018-11-27T08:32:26Z
message: 'no response from https://10.245.219.253:443: Get https://10.245.219.253:443:
net/http: request canceled while waiting for connection (Client.Timeout exceeded
while awaiting headers)'
reason: FailedDiscoveryCheck
status: "False"
type: Available
With the latter part of the output being of interest message: 'no response from https://10.245.219.253:443: Get https://10.245.219.253:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)'
I believe the above error message means that the kube-apiserver cannot speak to the metrics-server service. I believe this is due to the specifics of how the Digital Ocean Kubernetes master works.
I’ve performed some other general validation:
Service is configured:
Benjamins-MacBook-Pro:metrics-server benmaynard$ kubectl get service --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.245.0.10 <none> 53/UDP,53/TCP 2d19h
metrics-server ClusterIP 10.245.219.253 <none> 443/TCP 14m
metrics-server is up and running:
Benjamins-MacBook-Pro:metrics-server benmaynard$ kubectl logs metrics-server-7fbd9b8589-64x86 --namespace=kube-system
I1127 08:32:30.665197 1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
[restful] 2018/11/27 08:32:32 log.go:33: [restful/swagger] listing is available at https://:443/swaggerapi
[restful] 2018/11/27 08:32:32 log.go:33: [restful/swagger] https://:443/swaggerui/ is mapped to folder /swagger-ui/
I1127 08:32:32.981732 1 serve.go:96] Serving securely on [::]:443
Another customer has reported similar things: https://www.digitalocean.com/community/questions/cannot-get-kubernetes-horizonal-pod-autoscaler-or-metrics-server-working
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 5
- Comments: 49 (4 by maintainers)
The following works for me now:
helm install --name metrics stable/metrics-server --namespace kube-system -f values.yamlvalues.yaml:
I’m had the same problem. Here as I solved by using metrics-server chart from bitname repository.
@maxwedwards totally agree that basic metrics collection in a secure way is a must-have. Apologies if this came across us “we don’t care about this”, we genuinely do. What I was trying to express is that CCM is likely not the place where the fix should (maybe even can) happen: the project’s primary purpose is to implement the cloud provider interface that is defined by upstream Kubernetes. I checked again but don’t see a way to hook into host name resolutions. We could presumably hack it into the project, but it might not be the right place to do so.
Regardless, I have filed an internal ticket to track progress on the matter and did some initial investigations myself that confirm your findings. WiIl keep you posted in this issue since it has become a place to go to for most users that ran into the problem.
By the way, we also have something else in the pipeline at DO to improve on the observability front which we intend to release not too far in the future. It only touches partially on the subject discussed here though, proper metrics-server integration still is a given to support features that built on top of it (like autoscaling).
Sorry again in case I have sent the wrong message. Appreciate the feedback!
Adding these vars on deployment file of metrics-server works.
1.19.3-do.2 was just released and should fix the problem. Please report back if that’s not the case.
Sorry for the inconveniences!
I just ran tests with all versions of DOKS currently supported (1.11.5-do.2, 1.12.3-do.2, and 1.13.1-do.2 as of this writing). In each case, I was able to read metrics properly.
Here’s what I did (mostly summarizing what was mentioned before in this issue):
kubectl run --image=nginx nginx --requests=cpu=200m && kubectl autoscale deploy nginx --min=1 --max=10 --cpu-percent=80(please take note that you need to specify CPU requests for HPA to take action)kubectl get hpa nginxpopulated (orkubectl top nodefor a more basic check unrelated to HPA)The waiting part in the last step is important: It takes 1-2 minutes for metrics to show up.
FWIW, I created my Kubernetes clusters in the FRA1 region.
Is anyone not able to reproduce a successful setup with my steps outlined above?
@timoreimann you’re welcome, checked my clusters status page a few minutes before you posted and started the upgrade. Was kinda hoping to beat you to it, but more glad that this is fix 🎉
@WyriHaximus thanks for confirming! 💙
@timoreimann just started the upgrade and can confirm that this is now fixed
@Simwar and others: we just opened up a new repository to track more general feature requests and bug reports related to DOKS (but not specific to any other of our repos, like this one). I created digitalocean/DOKS#2 to address the issue around metrics-server not supported with TLS on DOKS.
Please continue discussions on the new issue. Thanks!
Sorry folks, we’re working through this, it’s been a bit busy with KubeCON coming up. Hoping to have more details soon (by next week maybe?) 😃