kubernetes: kubelet: memorymanager static policy startup error

What happened?

In our scenario, memoryManager are enabled. There are two numa nodes: node0 and node1 on the host. The relevant parameter values are as follows: memoryManagerPolicy: Static

Initially, no pods are running on this node with two numa node, 220G memory per numa node. Follow the steps bellow to create and delete pods:

  1. create guaranteed Pod1 with one container, memory req and limit: 240G

  2. create guaranteed Pod2 with one container, memory req and limit: 20G At this point, machineState is as follows: image

  3. delete Pod2 At this point, machineState will be as follows: image

  4. create guaranteed Pod3 with one container, memory req and limit: 10G At this point, actual machineState is as follows: image

Now, restarting kubelet will fail. When kubelet restart, the expected machineState is as follows that is not equal to actual machineState above. image

What did you expect to happen?

Pod creation and deletion order should not cause kubelet restart to fail.

How can we reproduce it (as minimally and precisely as possible)?

See analysis above.

Anything else we need to know?

No response

Kubernetes version

v1.21 and above versions have this problem

Cloud provider

None

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 1
  • Comments: 16 (4 by maintainers)

Most upvoted comments

/remove-lifecycle rotten

since we have an active PR I will move this to triaged

/triage accepted

I see, probably it will not be a problem when we will have both pods to be pinned to multiple NUMA nodes. And looks like you are correct under the comment https://github.com/kubernetes/kubernetes/issues/113130#issuecomment-1301607606, we should validate the state in descending order. BTW kudos for the report 😃