kubernetes: Warning FailedGetResourceMetric horizontal-pod-autoscaler missing request for cpu

What happened:

HPA always has a target of <unknown>/70% and events that say:

Events:
  Type     Reason                        Age                    From                       Message
  ----     ------                        ----                   ----                       -------
  Warning  FailedComputeMetricsReplicas  36m (x12 over 38m)     horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu
  Warning  FailedGetResourceMetric       3m50s (x136 over 38m)  horizontal-pod-autoscaler  missing request for cpu
  • There is a single container in the pods and it has resource requests and limits set.
  • The metrics-server is running
  • All pods have metrics show in kubectl top pod
  • All pods have metrics in kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods"

Here’s the HPA in YAML:

apiVersion: autoscaling/v1
  kind: HorizontalPodAutoscaler
  metadata:
    annotations:
      autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2019-06-25T09:56:21Z","reason":"SucceededGetScale","message":"the
        HPA controller was able to get the target''s current scale"},{"type":"ScalingActive","status":"False","lastTransitionTime":"2019-06-25T09:56:21Z","reason":"FailedGetResourceMetric","message":"the
        HPA was unable to compute the replica count: missing request for cpu"}]'
    creationTimestamp: "2019-06-25T09:56:06Z"
    labels:
      app: restaurant-monitor
      env: prd01
      grafana: saFkkx6ik
      rps_region: eu01
      team: vendor
    name: myapp
    namespace: default
    resourceVersion: "56108423"
    selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/myapp
    uid: 7345f8fb-972f-11e9-935d-02a07544d854
  spec:
    maxReplicas: 25
    minReplicas: 14
    scaleTargetRef:
      apiVersion: extensions/v1beta1
      kind: Deployment
      name: myapp
    targetCPUUtilizationPercentage: 70
  status:
    currentReplicas: 15
    desiredReplicas: 0
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

What you expected to happen:

No <unknown> in HPA target

How to reproduce it (as minimally and precisely as possible):

I can’t be sure. It’s only a single HPA in our cluster. 10 other HPAs are working OK.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.12.6
  • Cloud provider or hardware configuration: EKS

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 54
  • Comments: 69 (11 by maintainers)

Most upvoted comments

I ran into this as well and this fixed it for me:

I am running pods with more than one container. In my case, the other container is a linkerd sidecar. I was setting the resource requests and limits for my deployment but did not set resources for linkerd proxy.

You must set resources for all containers within a pod otherwise you will get the error “failed to get cpu utilization”. Maybe this error message could be updated?

Hope this helps!

I removed the cluster, and rebuild it from scratch. The problem doesn’t appear anymore.

Had same issue with a deployment that could not scale because of the

"failed to get cpu utilization: missing request for cpu"

error that the HPA of the deployment was showing.

Finally got it fixed now.

Here the reasons & background:

My deployment consist of a

  • Job that run at the beginning
  • regular POD with three containers - two “sidecar” containers and one with the main app

The “main app” container had “resources” set. Both “sidecar” containers had not.

So first problem were the missing “resources” specs on both sidecar containers.

Such behavior with multiple containers in the POD is described in https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Please note that if some of the Pod’s containers do not have the relevant resource request set, CPU utilization for the Pod will not be defined and the autoscaler will not take any action for that metric. See the algorithm details section below for more information about how the autoscaling algorithm works.

The second problem was that the “Job” that ran before the actual app deployment has ALSO to have “resources” defined.

And THAT was really unexpected.

That is something where @max-rocket-internet also stumbled upon & what i’ve tested then. @max-rocket-internet - thanks for the hint 🍺

So, TIL:

  • enable “resources” on ALL containers/jobs in the POD

@hex108 I am not working on this. 😃

Ahhh, I deleted those Completed pods and suddenly the HPA is back in action:

myapp                Deployment/myapp                12%/70%         14        25        14         25h

I had the same case with sidecars, setting request/limits for all containers fixed the problem. Thanks!

@hyprnick This worked for me. I had to add resource requests/limits to a sidecar container and remove “Completed” jobs from the namespace.

But these Completed pods are not from the deployment that is specified in the HPA, they are created from a Job. Sure they don’t have resources set but they should be ignored by the HPA, right?

Here’s the pod JSON from one:

{
  "apiVersion": "v1",
  "kind": "Pod",
  "metadata": {
    "annotations": {
      "checksum/config": "31e32a934d7d95c9399fc8ca8250ca6e6974c543e4ee16397b5dcd04b4399679"
    },
    "creationTimestamp": "2019-06-26T00:00:03Z",
    "generateName": "myapp-issue-detection-job-1561507200-",
    "labels": {
      "controller-uid": "59709b7a-97a5-11e9-b7c2-06c556123efe",
      "env": "prd01",
      "job-name": "myapp-issue-detection-job-1561507200",
      "team": "vendor"
    },
    "name": "myapp-issue-detection-job-1561507268cnr",
    "namespace": "default",
    "ownerReferences": [
      {
        "apiVersion": "batch/v1",
        "blockOwnerDeletion": true,
        "controller": true,
        "kind": "Job",
        "name": "myapp-issue-detection-job-1561507200",
        "uid": "59709b7a-97a5-11e9-b7c2-06c556123efe"
      }
    ],
    "resourceVersion": "56293646",
    "selfLink": "/api/v1/namespaces/default/pods/myapp-issue-detection-job-1561507268cnr",
    "uid": "59733023-97a5-11e9-b7c2-06c556123efe"
  }
}

I will test creating more Completed pods WITHOUT resources set and see if the issue returns. And then test creating more Completed pods WITH resources and see if it’s OK.

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

I suspect the reason most of us are here is our beloved nginx-ingress-controller fails to autoscale leading to many tears and much frustration. It seems the chart is giving same labels to the defaultbackend that overlap with the controller deployment causing this issue. Set this in your values yaml to get around it:

 defaultBackend:
  useComponentLabel: true
  componentLabelKeyOverride: defaultBackend
  deploymentLabels:
    defaultBackendDeployment: "true"
  podLabels:
    defaultBackendPod: "true"

Feel free to change the labels to whatever you want if you want to be more creative. 😄

Other option is to set:

controller:
  useComponentLabel: true
defaultBackend:
  useComponentLabel: true

This is cleaner option, but you cannot set it to this without a maintenance if you are currently deployed because you need to redeploy the controller in this instance because the label is immutable.

I had a linkerd proxy container injected into the pods with no request nor limits defined. Once I defined them and reinitiated the deployments hpa was happily working.

Reading the discussion it seems to me the error message missing request for cpu can have multiple causes, which adds to the confusion. IMO a good action item would be to make the message more detailed, e.g. pointing to which pod and container didn’t have the requests set.

I ran into this as well and this fixed it for me: I am running pods with more than one container. In my case, the other container is a linkerd sidecar. I was setting the resource requests and limits for my deployment but did not set resources for linkerd proxy. You must set resources for all containers within a pod otherwise you will get the error “failed to get cpu utilization”. Maybe this error message could be updated? Hope this helps!

Hi

I want to do same can you help me how i can set resources for the linkerd proxy containers

I did the injection on the Namespaces level:

---
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    linkerd.io/inject: enabled
    config.linkerd.io/proxy-cpu-limit: "1"
    config.linkerd.io/proxy-cpu-request: "0.1"
    config.linkerd.io/proxy-memory-limit: 1Gi
    config.linkerd.io/proxy-memory-request: 128Mi
  name: common
  labels:
    name: common

Alternatively you could apply on the Deployment level:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-service
spec:
  template:
    metadata:
      annotations:
        config.linkerd.io/proxy-cpu-limit: "1"
        config.linkerd.io/proxy-cpu-request: "0.1"
        config.linkerd.io/proxy-memory-limit: 1Gi
        config.linkerd.io/proxy-memory-request: 128Mi
    spec:
      containers:
      - name: app
        image: my-service:1.0.1
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: 128M
            cpu: 125m
          limits:
            memory: 1024M
            cpu: 1000m

I had to add resource requests/limits to a sidecar container setting request/limits for all containers fixed the problem.

That’s great but it’s still a bug for people who only have a single container in their pods.

I’ve asked for reviews on https://github.com/kubernetes/kubernetes/pull/86044

@max-rocket-internet I submitted a tentative PR #86044

I found the same issue and in my case the reason because the pod or pods are failing with the metrics is because the POD is not 100% ready… Check the healtchecks , security groups, etc.

Here more info: https://docs.aws.amazon.com/eks/latest/userguide/horizontal-pod-autoscaler.html

@alexvaque In my case, I had to add the request resource to the deployment to fix the issue

Interesting. I was able to create an HPA for a simple sample deployment but Nginx is still unable to retrieve metrics.

Looks like this is related to how HPA uses labels https://github.com/helm/charts/issues/20315#issuecomment-595324778

Encountered the same issue and was able to resolve it by making sure that the cronjob pod labels should be different from the deployment matchLabels.

In @max-rocket-internet’s case deployment pod matchlabels

matchLabels:
  app: app01
  env: prd01
  rps_region: eu01
  team: vendor

cronjob pod labels

labels:
  app: app01
  env: prd01
  rps_region: eu01
  team: vendor
  controller-uid: 09662df3-98e9-11e9-b7c2-06c556123efe
  grafana: grafana_dashboard_link
  job-name: myapp-runner-job-1561646220

Maybe worth noting that even though I got these events and the HPA is saying failed to get cpu utilization: missing request for cpu it still didn’t transition to <unknown> / 70% after 10 minutes. But if I create the HPA while there is these Completed pods then it stays in <unknown> state.

I’m getting same issue, I have enable metric-server in minikube, when I create hpa its always says FailedGetResourceMetric 4m15s (x21 over 9m16s) horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API

my deployment able to get scale, but not scale down even after hours,

--------Edited------

I have tried my same deployment with kind cluster and its working fine, there is some issue with minikube

I found the same issue and in my case the reason because the pod or pods are failing with the metrics is because the POD is not 100% ready… Check the healtchecks , security groups, etc.

Here more info: https://docs.aws.amazon.com/eks/latest/userguide/horizontal-pod-autoscaler.html

for the stable/nginx-ingress you can either set the same limits on the defaultBackend

@@ -80,3 +80,10 @@ defaultBackend:
   - operator: "Exists"
   service:
     omitClusterIP: true
+  resources:
+    requests:
+      cpu: 100m
+      memory: "256Mi"
+    limits:
+      cpu: 200m
+      memory: "512Mi"

or probably use @oba11 's idea and set probably defaultBackend.deploymentLabels and/or defaultBackend.podLabels specifically if the requests/limits are much higher on the controller pods and it would be a waste of resources

@hyprnick This worked for me. I had to add resource requests/limits to a sidecar container and remove “Completed” jobs from the namespace.

work form me !! thanks

Using the new beta API autoscaling/v2beta2 seems to solve it

---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50

@hyprnick It worked. Thanks!

experiencing the same issue

Configured HPA as

root@k8master:~/metrics-server/deploy/1.8+# kubectl describe horizontalpodautoscaler.autoscaling/nginx
Name:                                                  nginx
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Fri, 27 Dec 2019 21:29:07 +0000
Reference:                                             Deployment/nginx
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 80%
Min replicas:                                          2
Max replicas:                                          5
Deployment pods:                                       2 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: missing request for cpu
Events:
  Type     Reason                        Age   From                       Message
  ----     ------                        ----  ----                       -------
  Warning  FailedGetResourceMetric       2s    horizontal-pod-autoscaler  missing request for cpu
  Warning  FailedComputeMetricsReplicas  1s    horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: missing request for cpu

and

root@k8master:~/metrics-server/deploy/1.8+# kubectl get hpa
NAME    REFERENCE          TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
nginx   Deployment/nginx   <unknown>/80%   2         5         2          4m3s
root@k8master:~/metrics-server/deploy/1.8+# 

Deployment CPU resources as

root@k8master:~/metrics-server/deploy/1.8+# kubectl get -o json deployment nginx | jq '.spec.template.spec.containers[].resources'
{
  "limits": {
    "cpu": "2"
  },
  "requests": {
    "cpu": "200m",
    "memory": "50Mi"
  }
}

Cluster details

root@k8master:~/metrics-server/deploy/1.8+# kubectl get node -o wide
NAME       STATUS   ROLES    AGE    VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
k8master   Ready    master   6d9h   v1.16.3   172.168.57.5   <none>        Ubuntu 16.04.6 LTS   4.15.0-72-generic   docker://18.9.7
k8node     Ready    <none>   6d9h   v1.16.3   172.168.57.3   <none>        Ubuntu 16.04.6 LTS   4.15.0-72-generic   docker://18.9.7
k8node2    Ready    <none>   6d9h   v1.16.3   172.168.57.7   <none>        Ubuntu 16.04.6 LTS   4.15.0-72-generic   docker://18.9.7

@hex108

Sure. Here’s from the deployment (kubectl get -o json deployment myapp | jq '.spec.template.spec.containers[].resources'):

{
  "limits": {
    "cpu": "2",
    "memory": "2Gi"
  },
  "requests": {
    "cpu": "1",
    "memory": "2Gi"
  }
}

This shows there’s only a single container in these pods.

Here’s a list of pods:

$ kubectl get -l app=restaurant-monitor pod
NAME                                                              READY   STATUS      RESTARTS   AGE
myapp-67d9c5849d-24qbs                    1/1     Running     0          18h
myapp-67d9c5849d-9l8z4                    1/1     Running     0          18h
myapp-67d9c5849d-bv6sf                    1/1     Running     0          18h
myapp-67d9c5849d-hgqw9                    1/1     Running     0          18h
myapp-67d9c5849d-j5n2r                    1/1     Running     0          18h
myapp-67d9c5849d-kctgn                    1/1     Running     0          18h
myapp-67d9c5849d-ldhmq                    1/1     Running     0          18h
myapp-67d9c5849d-mfrd5                    1/1     Running     0          18h
myapp-67d9c5849d-p8cz4                    1/1     Running     0          18h
myapp-67d9c5849d-rm9nl                    1/1     Running     0          18h
myapp-67d9c5849d-shlj6                    1/1     Running     0          18h
myapp-67d9c5849d-sxs8f                    1/1     Running     0          18h
myapp-67d9c5849d-tpfp8                    1/1     Running     0          17h
myapp-67d9c5849d-vsz78                    1/1     Running     0          18h
myapp-issue-detection-job-15613344fl42z   0/1     Completed   0          2d11h
myapp-issue-detection-job-15614208rmdkj   0/1     Completed   0          35h
myapp-issue-detection-job-1561507268cnr   0/1     Completed   0          11h

And resources from all pods ($ kubectl get -o json -l app=myapp pod | jq '.items[].spec.containers[].resources'):

{
  "limits": {
    "cpu": "2",
    "memory": "2Gi"
  },
  "requests": {
    "cpu": "1",
    "memory": "2Gi"
  }
}
{
  "limits": {
    "cpu": "2",
    "memory": "2Gi"
  },
  "requests": {
    "cpu": "1",
    "memory": "2Gi"
  }
}

That just repeats 14 times, once for each pod. And then the 3 completed pods show as:

{}
{}
{}