kubernetes: Nil pointer dereference in KCM after v1 HPA patch request

What happened?

If we create a v2beta2 HPA object with fields specified for the scaleUp or scaleDown behavior, these values can be modified through the v1 version of the same object by changing the annotation for autoscaling.alpha.kubernetes.io/behavior. If we patch the scaleUp or scaleDown field in the annotation to null, the patch request will get accepted without the default values for scaleUp or scaleDown being applied. This will result in a panic in KCM when it tries to deference a nil pointer:

E1207 02:43:43.082791      10 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 2238 [running]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x41c0960, 0x747b350)
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86
panic(0x41c0960, 0x747b350)
    /usr/local/go/src/runtime/panic.go:965 +0x1b9
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).stabilizeRecommendationWithBehaviors(0xc000a700b0, 0xc0004bd760, 0x1a, 0x0, 0xc001a4c570, 0x500000004, 0x100000004, 0x0, 0x7fe0a1a18988, 0x7fe0a1a26ef8, ...)
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:885 +0x3c
...

This is panic is occurring here: https://github.com/kubernetes/kubernetes/blob/v1.21.2/pkg/controller/podautoscaler/horizontal.go#L885

What did you expect to happen?

The patch request to the v1 HPA object should have applied the default value for the scaleUp or scaleDown fields when the annotation edit set it to null. When the v2beta2 object is updated, omitting the scaleUp or scaleDown properties will result in default values being applied so it is assumed that no fields in this struct are nil: https://github.com/kubernetes/kubernetes/blob/6ac2d8edc8606ab387924b8b865b4a69630080e0/pkg/apis/autoscaling/v2/defaults.go#L104

How can we reproduce it (as minimally and precisely as possible)?

Using v1.21.2:

  1. Create an example HPA object using autoscaling/v2beta2
cat <<EOF | kubectl apply -n kube-system -f -
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: coredns-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: coredns
  minReplicas: 4
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
    scaleDown: 
      policies: 
      - type: Pods 
        value: 4 
        periodSeconds: 60 
      - type: Percent
        value: 10 
        periodSeconds: 60
      selectPolicy: Min 
      stabilizationWindowSeconds: 300 
EOF
  1. Edit the v1 version of the HPA object to set scaleUp as null in the annotation:
kubectl edit hpa.v1.autoscaling coredns-scaler -n kube-system

Change this annotation:

autoscaling.alpha.kubernetes.io/behavior: '{"ScaleUp":{"StabilizationWindowSeconds":0,"SelectPolicy":"Max","Policies":[{"Type":"Pods","Value":4,"PeriodSeconds":15},{"Type":"Percent","Value":100,"PeriodSeconds":15}]},"ScaleDown":{"StabilizationWindowSeconds":300,"SelectPolicy":"Min","Policies":[{"Type":"Pods","Value":4,"PeriodSeconds":60},{"Type":"Percent","Value":10,"PeriodSeconds":60}]}}'

To this:

autoscaling.alpha.kubernetes.io/behavior: '{"ScaleUp":null,"ScaleDown":{"StabilizationWindowSeconds":300,"SelectPolicy":"Min","Policies":[{"Type":"Pods","Value":4,"PeriodSeconds":60},{"Type":"Percent","Value":10,"PeriodSeconds":60}]}}'
  1. Now when we fetch the v2beta2 object, the scaleUp spec is missing:
kubectl get hpa.v2beta2.autoscaling coredns-scaler -n kube-system -o yaml
...
spec:
  behavior:
    scaleDown:
      policies:
      - periodSeconds: 60
        type: Pods
        value: 4
      - periodSeconds: 60
        type: Percent
        value: 10
      selectPolicy: Min
      stabilizationWindowSeconds: 300
  maxReplicas: 5
  metrics:
...
  1. KCM intermittently becomes unhealthy:
sh-4.2$ kubectl get cs

NAME                 STATUS      MESSAGE                                                                                       ERROR
controller-manager   Unhealthy   Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused
scheduler            Healthy     ok
etcd-0               Healthy     {"health":"true"}

Checking KCM logs:

E1215 00:11:52.291834       9 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 2339 [running]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x41c0960, 0x747b350)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86
panic(0x41c0960, 0x747b350)
        /usr/local/go/src/runtime/panic.go:965 +0x1b9
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).stabilizeRecommendationWithBehaviors(0xc000d911e0, 0xc0012a7440, 0x1a, 0x0, 0xc001605bf0, 0x500000004, 0x10000
0004, 0x0, 0x7f4d5e8af368, 0x7f4d5e8b1fc0, ...)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:885 +0x3c
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).normalizeDesiredReplicasWithBehaviors(0xc000d911e0, 0xc0000fc700, 0xc0012a7440, 0x1a, 0x100000004, 0x4, 0x1)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:781 +0x127
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).reconcileAutoscaler(0xc000d911e0, 0xc00096be00, 0xc0012a7440, 0x1a, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:679 +0x2070
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).reconcileKey(0xc000d911e0, 0xc0012a7440, 0x1a, 0x3eca300, 0xc000d8ae48, 0x17113ec)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:371 +0x1b7
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).processNextWorkItem(0xc000d911e0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:225 +0xd8
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).worker(0xc000d911e0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:213 +0x2f
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc001dd1100)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001dd1100, 0x512cde0, 0xc001dd90b0, 0x512df01, 0xc000e22d80)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001dd1100, 0x3b9aca00, 0x0, 0xc001dd2701, 0xc000e22d80)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc001dd1100, 0x3b9aca00, 0xc000e22d80)                                                                                     /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d
created by k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).Run
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:177 +0x245
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference                                                                                                      [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x17b41bc]

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-06eac09", GitCommit:"5f6d83fe4cb7febb5f4f4e39b3b2b64ebbbe3e97", GitTreeState:"clean", BuildDate:"2021-09-13T14:23:18Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-06eac09", GitCommit:"5f6d83fe4cb7febb5f4f4e39b3b2b64ebbbe3e97", GitTreeState:"clean", BuildDate:"2021-09-13T14:20:15Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

AWS

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 1
  • Comments: 15 (10 by maintainers)

Most upvoted comments

@natherz97 @pacoxu

  • Since func GenerateHPAScaleUpRules sets default values for nil values then why this error is showing.
  • Does this comment tells something about this behavior.

Guidance needed. Could you elaborate more about that what is the cause of this bug ?