metrics-server: unable to fetch metrics from node c2: request failed - "403 Forbidden"

What happened? I execute this: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml and args:

      - args:
            - --cert-dir=/tmp
            - --secure-port=4443
            - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
            - --kubelet-use-node-status-port
            - --metric-resolution=15s
            - --kubelet-insecure-tls
       

log shows: `unable to fetch metrics from node c2: request failed - “403 Forbidden”

`What did you expect to happen? How to resolve it?

Anything else we need to know? kubectl describe apiservice v1beta1.metrics.k8s.io

Status:
  Conditions:
    Last Transition Time:  2022-03-24T09:26:42Z
    Message:               all checks passed
    Reason:                Passed
    Status:                True
    Type:                  Available
Events:                    <none>

I’ve been stuck with this problem for a day…

kubectl version Client Version: version.Info{Major:“1”, Minor:“21”, GitVersion:“v1.21.8”, GitCommit:“4”, GitTreeState:“clean”, BuildDate:“2021”, GoVersion:“go1.16.12”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“21”, GitVersion:“v1.21.8”, GitCommit:“4”, GitTreeState:“clean”, BuildDate:“2021”, GoVersion:“go1.16.12”, Compiler:“gc”, Platform:“linux/amd64”}

/kind support

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 28 (10 by maintainers)

Most upvoted comments

Ran into the 403 forbidden issue in our eks cluster. Upgrading from 0.5.1 to 0.6.1 and noticed the breaking change in the release notes for 0.6.0 that changes a resource in the cluster role. I added nodes/stats back to the cluster role and things work for me now.

We have multiple clusters, most running 1.19 and one running 1.21 and this made it work in both versions.

kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  - nodes/stats
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch

@mzaian : We had to roll back from v0.6.1 to v0.5.2 because of error “Failed to scrape node” reported in #1031 and that is fixed for v0.6.2 - that is not available yet.

@stevehipwell so our sec group for the cluster still has an inbound all/all rule but I decided to add an explicit inbound rule for 10250 and then removed node/stats from the cluster role, restarted the metrics server pod and lo and behold, I still have metrics and no 403 forbidden errors.

However, I then removed the node/stats line from the cr in another cluster which DOESN’T have the explicit 10250 sec group inbound rule, restarted the metrics server pod and it’s fine. No 403, metrics still being collected.

In conclusion, I have no idea what happened to make this work suddenly without the sec group rule and the added line to the cr but it does. If I encounter any other issues in the near term I’ll reply here but things suddenly seem to be working as intended.

@rmendal this might be un-related but for Metrics Server to function correctly the control plane needs to be able to reach the node on port 4443 and the nodes need to be able to communicate with each other on port 10250. You can reduce this to just port 10250 by changing the container port and --secure-port arg to 10250 in the manifest you’re applying.

As an aside I’m not sure if the AWS document you linked is correct as I think the outbound 443 was for legacy Metrics Server.