metrics-server: EKS Metrics Server can't scrape pod/node metrics - Unauthorized 401
What happened:
Metric server is not able to read metrics with the error: metrics not available yet
- HPA’s can’t read metrics
kubectl top pod/nodesreturn error: metrics not available yet
What you expected to happen:
Metrics server to scrape all pods and nodes.
Anything else we need to know?:
Everything is in the details section.
Environment:
-
Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.): EKS
-
Container Network Setup (flannel, calico, etc.): EKS VPC CNI
-
Kubernetes version (use
kubectl version):1.21.5-eks-bc4871b -
Metrics Server manifest
spoiler for Metrics Server manifest:
kind: ClusterRole
metadata:
labels:
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
port: 443
version: v1beta1
versionPriority: 100
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
k8s-app: metrics-server
name: metrics-server
spec:
containers:
- command:
- /metrics-server
- --v=2
- --kubelet-preferred-address-types=InternalIP
- --cert-dir=/tmp
- --secure-port=4443
image: private-repo:metrics-server-amd64-v0.3.6
imagePullPolicy: IfNotPresent
name: metrics-server
ports:
- containerPort: 4443
name: main-port
protocol: TCP
resources: {}
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: tmp-dir
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/arch: amd64
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: metrics-server
serviceAccountName: metrics-server
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: v1
kind: Service
metadata:
labels:
kubernetes.io/cluster-service: "true"
kubernetes.io/name: Metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- port: 443
protocol: TCP
targetPort: main-port
selector:
k8s-app: metrics-server
type: ClusterIP
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
- configmaps
verbs:
- get
- list
- watch
- Metrics server logs:
spoiler for Metrics Server logs:
I0218 15:49:00.089765 1 manager.go:148] ScrapeMetrics: time: 30.029378925s, nodes: 0, pods: 0
E0218 15:49:00.089852 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-10-224-57-165.ec2.internal: unable to fetch metrics from Kubelet ip-10-224-57-165.ec2.internal (10.224.57.165): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-224-57-153.ec2.internal: unable to fetch metrics from Kubelet ip-10-224-57-153.ec2.internal (10.224.57.153): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-224-55-86.ec2.internal: unable to fetch metrics from Kubelet ip-10-224-55-86.ec2.internal (10.224.55.86): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-224-51-140.ec2.internal: unable to fetch metrics from Kubelet ip-10-224-51-140.ec2.internal (10.224.51.140): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-224-55-184.ec2.internal: unable to fetch metrics from Kubelet ip-10-224-55-184.ec2.internal (10.224.55.184): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-224-57-40.ec2.internal: unable to fetch metrics from Kubelet ip-10-224-57-40.ec2.internal (10.224.57.40): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-224-48-241.ec2.internal: unable to fetch metrics from Kubelet ip-10-224-48-241.ec2.internal (10.224.48.241): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-224-53-70.ec2.internal: unable to fetch metrics from Kubelet ip-10-224-53-70.ec2.internal (10.224.53.70): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-224-50-241.ec2.internal: unable to fetch metrics from Kubelet ip-10-224-50-241.ec2.internal (10.224.50.241): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-224-57-184.ec2.internal: unable to fetch metrics from Kubelet ip-10-224-57-184.ec2.internal (10.224.57.184): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-224-53-158.ec2.internal: unable to fetch metrics from Kubelet ip-10-224-53-158.ec2.internal (10.224.53.158): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-224-51-42.ec2.internal: unable to fetch metrics from Kubelet ip-10-224-51-42.ec2.internal (10.224.51.42): Get https://10.224.51.42:10250/stats/summary?only_cpu_and_memory=true: dial tcp 10.224.51.42:10250: i/o timeout]
E0218 15:49:00.093086 1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
E0218 15:49:00.093086 1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
E0218 15:49:00.093250 1 errors.go:77] Unauthorized
E0218 15:49:00.099175 1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
E0218 15:49:00.099367 1 errors.go:77] Unauthorized
E0218 15:49:00.109152 1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
E0218 15:49:00.109462 1 errors.go:77] Unauthorized
E0218 15:49:00.115605 1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
E0218 15:49:00.115756 1 errors.go:77] Unauthorized
E0218 15:49:00.125091 1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
E0218 15:49:00.125264 1 errors.go:77] Unauthorized
E0218 15:49:00.130884 1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
E0218 15:49:00.130983 1 errors.go:77] Unauthorized
E0218 15:49:00.139747 1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
E0218 15:49:00.139917 1 errors.go:77] Unauthorized
E0218 15:49:00.145794 1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
E0218 15:49:00.145963 1 errors.go:77] Unauthorized
E0218 15:49:00.155290 1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
E0218 15:49:00.155471 1 errors.go:77] Unauthorized
E0218 15:49:00.161147 1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
E0218 15:49:00.161306 1 errors.go:77] Unauthorized
E0218 15:49:00.181555 1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
E0218 15:49:00.181730 1 errors.go:77] Unauthorized
E0218 15:49:00.187571 1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
E0218 15:49:00.187749 1 errors.go:77] Unauthorized
- Status of Metrics API:
spolier for Status of Metrics API:
kubectl describe apiservice v1beta1.metrics.k8s.io
Name: v1beta1.metrics.k8s.io
Namespace:
Labels: <none>
API Version: apiregistration.k8s.io/v1
Kind: APIService
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2021-09-20T12:27:29Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Events: <none>
/kind bug
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (10 by maintainers)
I experienced a similar problem on EKS v1.21:
v1beta1.metrics.k8s.ioshown as unavailable viakubectl get apiservice, hpa’s not being able to scale.Cloudwatch kube-controller-manager showed lines like:
In this case it were new clusters where some things were different compared to clusters we already have running where everything works fine:
The terraform module by default gives (among others) the security group attached to nodes rules like this:
After adding a sg rule that matches the container port configured in metrics server when using the latest helm chart, everything worked.
This does align with the endpoints of the metrics-server service:
Still wrapping my head around if this makes sense, VPC CNI networking is not the easiest part of EKS.
Update: Reading OP again which really mentions a 401, my problem obviously was a different one. Comparing the cluster rules in the OP, I notice a subtle difference between those, and the ones installed via helm chart on EKS 1.21:
nodes/metrics@TBeijen your issue is/was separate and is specifically about the changes that were made in the v18 release of the EKS module dropping almost all SG rules. I’m not sure if the module docs have been updated but it’s covered in a number of issues. As an aside, and I’m sure you’re aware of this, when using the AWS VPC CNI you don’t need to use host network for MS as long as your SGs are configured correctly.