amazon-eks-ami: Containers fail to create and probe exec errors related to seccomp on recent kernel-5.10 versions

What happened:

After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS clusters after a few days a number of the nodes have containers stuck in ContainerCreating state or liveness/readiness probes reporting the following error:

Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: failed to start exec "4a11039f730203ffc003b7e64d5e682113437c8c07b8301771e53c710a6ca6ee": OCI runtime exec failed: exec failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524: unknown

This issue is very similar to https://github.com/awslabs/amazon-eks-ami/issues/1179. However, we had not been seeing this issue on previous AMIs and it only started to occur on v20230217 (following the upgrade from kernel 5.4 to 5.10) with no other changes to the underlying cluster or workloads.

We tried the suggestions from that issue (sysctl net.core.bpf_jit_limit=452534528) which helped to immediately allow containers to be created and probes to execute but after approximately a day the issue returned and the value returned by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}' was steadily increasing.

What you expected to happen:

Containers to launch successfully and become Ready
Liveness an readiness probes to execute successfully

How to reproduce it (as minimally and precisely as possible):

I don’t currently have a reproduction that I can share due to my current one using some internal code (I can hopefully produce a more generic one if required when I get a chance).

As a starting point we only noticed this happening on nodes that had pods scheduled on them which had an exec liveness & readiness probe running every 10 seconds that performs a health check against a gRPC service using grpcurl. In addition to this we also have a default Pod Security Policy (yes we know they are deprecated 😄) that has the following annotation seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default.

These two conditions seem to be enough to trigger this issue and the values reported by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}' will steadily increase over time until containers can no longer be created on the node.

Anything else we need to know?:

Environment:

AWS Region: Multiple
Instance Type(s): Mix of x86_64 and arm64 instances of varying sizes
EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion): "eks.4"
Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version): "1.24"
AMI Version: v20230217
Kernel (e.g. uname -a): 5.10.165-143.735.amzn2.x86_64 #1 SMP Wed Jan 25 03:13:54 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Release information (run cat /etc/eks/release on a node):

BASE_AMI_ID="ami-09bffa74b1e396075"
BUILD_TIME="Fri Feb 17 21:59:10 UTC 2023"
BUILD_KERNEL="5.10.165-143.735.amzn2.x86_64"
ARCH="x86_64"

Official Guidance

Kubernetes pods using SECCOMP filtering on EKS optimized AMIs based on Linux Kernel version 5.10.x may get stuck in ContainerCreating state or their liveness/readiness probes fail with the following error:

unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524

When a process with SECCOMP filters creates a child process, the same filters are inherited and applied to the new process. The Amazon Linux kernel versions 5.10.x are affected by a memory leak that occurs when parent process is terminated while creating a child process. When the total amount of memory allocated for SECCOMP filter is over the limit, a process cannot create a new SECCOMP filter. As a result, the parent process fails to create a new child process and the above error message will be logged.

This issue is more likely to be encountered with kernel versions kernel-5.10.176-157.645.amzn2 and kernel-5.10.177-158.645.amzn2 where the rate of the memory leak is higher.

Amazon Linux will be releasing the fixed kernel by May 1st, 2023. We will be releasing a new set of EKS AMIs with the updated kernel latest by May 3rd, 2023.

About this issue

Original URL
State: closed
Created a year ago
Reactions: 37
Comments: 54 (19 by maintainers)

Commits related to this issue

bpf: Adjust insufficient default bpf_jit_limit We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS clust... — committed to intel-lab-lkp/linux by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS clust... — committed to kernel-patches/bpf by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS clust... — committed to chantra/kernel-patches-bpf by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS clust... — committed to kernel-patches/bpf-rc by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS clust... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago

Most upvoted comments

The v20230501 release has started now, and it includes 5.10.178-162.673.amzn2.x86_64 for all AMIs that use 5.10 kernels. We have tested the kernel and expect it to resolve this issue for customers. New AMIs should be available in all regions late tonight (PDT).

+25

mmerkes on May 3, 2023

Official guidance:

unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524

This issue is more likely to be encountered with kernel versions kernel-5.10.176-157.645.amzn2 and kernel-5.10.177-158.645.amzn2 where the rate of the memory leak is higher.

Amazon Linux will be releasing the fixed kernel by May 1st, 2023. We will be releasing a new set of EKS AMIs with the updated kernel latest by May 3rd, 2023.

+20

mmerkes on Apr 28, 2023

We created a new EKS cluster on version 1.24, After that below error started to show while containers are starting up.

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524: unknown

Any plans of reverting this to the last stable version till the time AWS finds a fix?

This ☝️

Why is a broken AMI still the default for Amazon’s managed node groups?

Can’t that be backed out or the release pulled?

+18

reedjosh on Apr 26, 2023

Yes, it’s available. Folks that manage custom AMIs can start using the kernel and we’re preparing AMIs for release on Wednesday that will include the latest kernel.

+16

mmerkes on May 2, 2023

We created a new EKS cluster on version 1.24, After that below error started to show while containers are starting up.

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524: unknown

Any plans of reverting this to the last stable version till the time AWS finds a fix?

+16

mynkkmr on Apr 26, 2023

Hey guys, appreciate you’re all subbing but think of the people that are already subbed getting all these pointless messages.

If you’re not gonna add any information that’s relevant to the resolution of the issue please refrain from sending another message and just click the subscribe button.

macmiranda on Apr 30, 2023

We’re following up on this with our kernel folks; we believe we’ve identified the necessary patches. I’ll update here once we’ve verified and have a kernel build in the pipeline.

cartermckinnon on Apr 25, 2023

FWIW with eksctl it is possible to pin previous version with

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: k8s
managedNodeGroups:
  - name: nodegroup
    releaseVersion: 1.24.11-20230406  # or any other from https://github.com/awslabs/amazon-eks-ami/releases
    ...

Of course this is very unfortunate bug that renders our nodes unusable within day or two even with increased bpf_jit_limit and we’re hoping for a quick fix.

radimk on Apr 27, 2023

@dougbaber thanks! I’ll make sure the ECS team is aware of this issue; any users of recent 5.10 kernel builds would be impacted.

cartermckinnon on Apr 25, 2023

After backporting a1140cb215fa (“seccomp: Move copy_seccomp() to no failure path.”) to our 5.10 kernel, I didn’t see any memleak with @essh 's repro. I will release a new kernel with the commit and post the backport patch to the upstream 5.10 tree as well.

q2ven on Mar 30, 2023

v20230501 is available in all regions now! Update to the latest EKS Optimized AMIs and this issue should be resolved.

mmerkes on May 4, 2023

The same problem for this setup:

Kernel version: 5.10.176-157.645.amzn2.x86_64
Kubelet version: v1.24.11-eks-a59e1f0

On:

Kernel version: 5.4.226-129.415.amzn2.x86_64
Kubelet version: v1.24.7-eks-fb459a0

works great.

tomislater on Apr 24, 2023

It’s non trivial to downgrade the kernel downstream when building AMI based on this upstream EKS node AMI which is on kernel 5.10

@stevo-f3 This should do it:

yum versionlock delete kernel
amazon-linux-extras disable kernel-5.10
amazon-linux-extras enable kernel-5.4
yum install -y kernel

At present, we have more users needing 5.10 who are not experiencing this leak than those who are; downgrading the official build to 5.4 would be a last resort if we can’t put a fix together.

cartermckinnon on Mar 23, 2023

@borkmann ACK on behalf of @cartermckinnon. please give us some time to do things…

dims on Mar 22, 2023

5.4 kernel would not be affected as it does not seem to have the offending commit 3a15fb6ed92c (“seccomp: release filter after task is fully dead”) which a1140cb215fa (“seccomp: Move copy_seccomp() to no failure path.”) fixes.

borkmann on Mar 21, 2023

Looks like potentially missing kernel commit in seccomp causing this issue: a1140cb215fa (“seccomp: Move copy_seccomp() to no failure path.”) (via https://lore.kernel.org/bpf/20230321170925.74358-1-kuniyu@amazon.com/)

borkmann on Mar 21, 2023

Is the kernel fix actually fixing the bug for good or is it just bumping the default bpf jit memory limit? can you provide a link to the patch?

michelesr on May 4, 2023

The kernel fix seems to have been released now, as 5.10.178-162.673.amzn2.x86_64.

nille on May 2, 2023

I’ve switched to Bottlerocket

rarecrumb on Apr 24, 2023