kubernetes: kube-scheduler panics on malformed label

What happened?

A deployment was inadvertently created with a malformed label in requiredAffinityTerms which appears to have caused the kube-scheduler to panic and crash.

Error from kube-scheduler pods:

E0107 05:56:44.099432       1 scheduling_queue.go:547] Error getting label selectors for pod: REDACTED.
E0107 05:56:44.099573       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 353 [running]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1b0f760, 0x2d9c290)
            /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
            /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x89
panic(0x1b0f760, 0x2d9c290)
            /usr/local/go/src/runtime/panic.go:969 +0x1b9

Events from REDACTED pod:

Events:
  Type     Reason             Age    From                Message
  ----     ------             ----   ----                -------
  Warning  FailedScheduling   4m44s  default-scheduler   0/40 nodes are available: 40 parsing pod: requiredAffinityTerms: invalid label value: "-api-gateway-mockservice": at key: "release": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?').
  Warning  FailedScheduling   4m44s  default-scheduler   0/40 nodes are available: 40 parsing pod: requiredAffinityTerms: invalid label value: "-api-gateway-mockservice": at key: "release": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?').
  Warning  FailedScheduling   110s   default-scheduler   0/40 nodes are available: 40 parsing pod: requiredAffinityTerms: invalid label value: "-api-gateway-mockservice": at key: "release": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?').
  Warning  FailedScheduling   51s    default-scheduler   0/40 nodes are available: 40 parsing pod: requiredAffinityTerms: invalid label value: "-api-gateway-mockservice": at key: "release": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?').
  Normal   NotTriggerScaleUp  4m40s  cluster-autoscaler  pod didn't trigger scale-up: 51 parsing pod: requiredAffinityTerms: invalid label value: "-api-gateway-mockservice": at key: "release": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')

What did you expect to happen?

kube-scheduler should not panic.

How can we reproduce it (as minimally and precisely as possible)?

Submit a deployment with a malformed label value.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:33:37Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.13", GitCommit:"2444b3347a2c45eb965b182fb836e1f51dc61b70", GitTreeState:"clean", BuildDate:"2021-11-17T13:00:29Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.23) and server (1.20) exceeds the supported minor version skew of +/-1

Cloud provider

AWS

OS version

$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
$ uname -a
Linux ip-10-117-7-107.eu-west-1.compute.internal 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Install tools

Container runtime (CRI) and and version (if applicable)

cri-o

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 26 (17 by maintainers)

Most upvoted comments

I agree, the pod should’ve not been admitted in the first place.

I think the issue is here: https://github.com/kubernetes/kubernetes/blob/8c69e5d25bfdbd266b384db29493bc9bad60f092/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/validation/validation.go#L57

we validate the label key but not value for MatchExpressions; we should add the following to validate the values:

for v := range sr.Values {
  for _, msg := range validation.IsValidLabelValue(v) {
    allErrs = append(allErrs, field.Invalid(fldPath.Child("values"), v, msg))
  }
}

we do that already with MatchLabels, but not for MatchExpressions:

https://github.com/kubernetes/kubernetes/blob/8c69e5d25bfdbd266b384db29493bc9bad60f092/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/validation/validation.go#L75

this error is emitted by the api-server.

It appears that error (event) was emitted by the kube-scheduler (see initial post above). If the kube-apiserver detected the error, then the manifest should have been rejected and the kube-scheduler would never have tried to schedule a pod in that deployment.

So it would seem that the apiserver needs an update to catch such malformed label values. It does that in metadata.labels, but not in labelSelector sections.

With malformed metadata.labels: { release: -api-gateway-mockservice }:

The Deployment "label-test" is invalid: metadata.labels: Invalid value: "-api-gateway-mockservice": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')

With the above malformed affinity, the apiserver accepted the Deployment.

Yes, I suppose we still need a bug fix.

Yes you are right, I think this issue no longer exists since https://github.com/kubernetes/kubernetes/commit/c7fef196b60856420ce3f0470acd1093ab1d9b5f So I think we can close this issue