kubernetes: GKE 1.6 kubelet /metrics endpoint unauthorized over https

BUG REPORT

After upgrading a GKE cluster from 1.5.6 to 1.6.0 Prometheus stopped to scrape the node /metrics endpoint due to a 401 unauthorized error.

This is likely due to RBAC being enabled. In order to give Prometheus access to the node metrics I added the following ClusterRole and ClusterRoleBinding and created a dedicated service account that is used by the pod.

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""] # "" indicates the core API group
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitoring

Although the mounted token is now the one for the prometheus service account - verified at https://jwt.io/ - it can’t get access to the node metrics (they’re served by the kubelet, right?).

If I execute the following command it returns the 401 Unauthorized

KUBE_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" https://<node ip>:10250/metrics

Any tips how to get to the bottom and figure out what’s needed to get this to work? I already looked at the issue with Prometheus contributors via ticket https://github.com/prometheus/prometheus/issues/2606 but as the curl doesn’t work either it’s probably not a Prometheus issue.

Kubernetes version

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:36:33Z", GoVersion:"go1.7.5
", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:24:30Z", GoVersion:"go1.7.5
", Compiler:"gc", Platform:"linux/amd64"}

Environment:

clusterIpv4Cidr: 10.248.0.0/14
createTime: '2016-11-14T19:26:49+00:00'
currentMasterVersion: 1.6.0
currentNodeCount: 14
currentNodeVersion: 1.6.0
endpoint: **REDACTED**
initialClusterVersion: 1.4.5
instanceGroupUrls:
- **REDACTED**
locations:
- europe-west1-c
loggingService: logging.googleapis.com
masterAuth:
  clientCertificate: **REDACTED**
  clientKey: **REDACTED**
  clusterCaCertificate: **REDACTED**
  password: **REDACTED**
  username: **REDACTED**
monitoringService: monitoring.googleapis.com
name: development-europe-west1-c
network: development
nodeConfig:
  diskSizeGb: 250
  imageType: COS
  machineType: n1-highmem-8
  oauthScopes:
  - https://www.googleapis.com/auth/compute
  - https://www.googleapis.com/auth/devstorage.read_only
  - https://www.googleapis.com/auth/service.management
  - https://www.googleapis.com/auth/servicecontrol
  - https://www.googleapis.com/auth/logging.write
  - https://www.googleapis.com/auth/monitoring
  serviceAccount: default
nodeIpv4CidrSize: 24

What happened:

With a ClusterRole configured I would expect to be able to scrape the /metrics endpoint on each node, but this fails with 401 Unauthorized.

What you expected to happen:

The service account token with appropriate ClusterRole to be able to give access to the /metrics endpoint.

How to reproduce it (as minimally and precisely as possible):

  • create namespace, serviceaccount, clusterrole, clusterrolebinding and deployment with linked serviceaccount
  • get the ip for one of the nodes
  • run KUBE_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) and curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" https://<node ip>:10250/metrics from the container in your deployment

Anything else we need to know:

This failed with the default service account as well. Whereas I thought initially GKE would still be very liberal with it’s access control settings.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 8
  • Comments: 16 (9 by maintainers)

Commits related to this issue

Most upvoted comments

Querying the same endpoint over http to port 10255 actually works. Any idea why there’s a difference?

Could the cause be similar to https://github.com/coreos/coreos-kubernetes/issues/714 ?

Ahhhh, ya that’s not going to work. We don’t plan on enabling token review API in GKE. You can either configure prometheus to pull metrics by hitting the apiserver proxy directly or you can create a client certificate using the certificates API for prometheus to use when contacting kubelets.

GKE doesn’t enable service account token authentication to the kubelet

cc @mikedanese @cjcullen

Take a look to the following parameter in kubelet exporter: https://github.com/coreos/prometheus-operator/blob/master/helm/exporter-kubelets/values.yaml#L2 Hope it helps

If GKE is using the GCE cluster up scripts, it isn’t enabling service account token authentication:

https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh#L699

to authenticate to the kubelet with API tokens, these steps would be needed (from https://kubernetes.io/docs/admin/kubelet-authentication-authorization/#kubelet-authentication):

  • ensure the authentication.k8s.io/v1beta1 API group is enabled in the API server
  • start the kubelet with the --authentication-token-webhook, --kubeconfig, and --require-kubeconfig flags
  • the kubelet calls the TokenReview API on the configured API server to determine user information from bearer tokens

GKE doesn’t enable service account token auth to the kubelet

I’m fairly certain we do…

In your ClusterRole I think

- nodes

should be

- nodes
- nodes/metrics

Like this https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/rbac/kubelet-api-admin-role.yaml#L16

Your nonResourceURLs doesn’t make sense.