amazon-eks-ami: Containers fail to create and probe exec errors related to seccomp on recent kernel-5.10 versions
What happened:
After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS clusters after a few days a number of the nodes have containers stuck in ContainerCreating state or liveness/readiness probes reporting the following error:
Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: failed to start exec "4a11039f730203ffc003b7e64d5e682113437c8c07b8301771e53c710a6ca6ee": OCI runtime exec failed: exec failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524: unknown
This issue is very similar to https://github.com/awslabs/amazon-eks-ami/issues/1179. However, we had not been seeing this issue on previous AMIs and it only started to occur on v20230217 (following the upgrade from kernel 5.4 to 5.10) with no other changes to the underlying cluster or workloads.
We tried the suggestions from that issue (sysctl net.core.bpf_jit_limit=452534528) which helped to immediately allow containers to be created and probes to execute but after approximately a day the issue returned and the value returned by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}' was steadily increasing.
What you expected to happen:
- Containers to launch successfully and become
Ready - Liveness an readiness probes to execute successfully
How to reproduce it (as minimally and precisely as possible):
I don’t currently have a reproduction that I can share due to my current one using some internal code (I can hopefully produce a more generic one if required when I get a chance).
As a starting point we only noticed this happening on nodes that had pods scheduled on them which had an exec liveness & readiness probe running every 10 seconds that performs a health check against a gRPC service using grpcurl. In addition to this we also have a default Pod Security Policy (yes we know they are deprecated 😄) that has the following annotation seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default.
These two conditions seem to be enough to trigger this issue and the values reported by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}' will steadily increase over time until containers can no longer be created on the node.
Anything else we need to know?:
Environment:
- AWS Region: Multiple
- Instance Type(s): Mix of x86_64 and arm64 instances of varying sizes
- EKS Platform version (use
aws eks describe-cluster --name <name> --query cluster.platformVersion):"eks.4" - Kubernetes version (use
aws eks describe-cluster --name <name> --query cluster.version):"1.24" - AMI Version:
v20230217 - Kernel (e.g.
uname -a):5.10.165-143.735.amzn2.x86_64 #1 SMP Wed Jan 25 03:13:54 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux - Release information (run
cat /etc/eks/releaseon a node):
BASE_AMI_ID="ami-09bffa74b1e396075"
BUILD_TIME="Fri Feb 17 21:59:10 UTC 2023"
BUILD_KERNEL="5.10.165-143.735.amzn2.x86_64"
ARCH="x86_64"
Official Guidance
Kubernetes pods using SECCOMP filtering on EKS optimized AMIs based on Linux Kernel version 5.10.x may get stuck in ContainerCreating state or their liveness/readiness probes fail with the following error:
unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524
When a process with SECCOMP filters creates a child process, the same filters are inherited and applied to the new process. The Amazon Linux kernel versions 5.10.x are affected by a memory leak that occurs when parent process is terminated while creating a child process. When the total amount of memory allocated for SECCOMP filter is over the limit, a process cannot create a new SECCOMP filter. As a result, the parent process fails to create a new child process and the above error message will be logged.
This issue is more likely to be encountered with kernel versions kernel-5.10.176-157.645.amzn2 and kernel-5.10.177-158.645.amzn2 where the rate of the memory leak is higher.
Amazon Linux will be releasing the fixed kernel by May 1st, 2023. We will be releasing a new set of EKS AMIs with the updated kernel latest by May 3rd, 2023.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 37
- Comments: 54 (19 by maintainers)
Commits related to this issue
- bpf: Adjust insufficient default bpf_jit_limit We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS clust... — committed to intel-lab-lkp/linux by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS clust... — committed to kernel-patches/bpf by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS clust... — committed to chantra/kernel-patches-bpf by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS clust... — committed to kernel-patches/bpf-rc by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS clust... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
- bpf: Adjust insufficient default bpf_jit_limit [ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ] We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading... — committed to ammarfaizi2/linux-block by borkmann a year ago
The
v20230501release has started now, and it includes5.10.178-162.673.amzn2.x86_64for all AMIs that use 5.10 kernels. We have tested the kernel and expect it to resolve this issue for customers. New AMIs should be available in all regions late tonight (PDT).Official guidance:
Kubernetes pods using SECCOMP filtering on EKS optimized AMIs based on Linux Kernel version 5.10.x may get stuck in
ContainerCreatingstate or their liveness/readiness probes fail with the following error:When a process with SECCOMP filters creates a child process, the same filters are inherited and applied to the new process. The Amazon Linux kernel versions 5.10.x are affected by a memory leak that occurs when parent process is terminated while creating a child process. When the total amount of memory allocated for SECCOMP filter is over the limit, a process cannot create a new SECCOMP filter. As a result, the parent process fails to create a new child process and the above error message will be logged.
This issue is more likely to be encountered with kernel versions
kernel-5.10.176-157.645.amzn2andkernel-5.10.177-158.645.amzn2where the rate of the memory leak is higher.Amazon Linux will be releasing the fixed kernel by May 1st, 2023. We will be releasing a new set of EKS AMIs with the updated kernel latest by May 3rd, 2023.
This ☝️
Why is a broken AMI still the default for Amazon’s managed node groups?
Can’t that be backed out or the release pulled?
Yes, it’s available. Folks that manage custom AMIs can start using the kernel and we’re preparing AMIs for release on Wednesday that will include the latest kernel.
We created a new EKS cluster on version 1.24, After that below error started to show while containers are starting up.
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524: unknownAny plans of reverting this to the last stable version till the time AWS finds a fix?
Hey guys, appreciate you’re all subbing but think of the people that are already subbed getting all these pointless messages.
If you’re not gonna add any information that’s relevant to the resolution of the issue please refrain from sending another message and just click the subscribe button.
We’re following up on this with our kernel folks; we believe we’ve identified the necessary patches. I’ll update here once we’ve verified and have a kernel build in the pipeline.
FWIW with
eksctlit is possible to pin previous version withOf course this is very unfortunate bug that renders our nodes unusable within day or two even with increased
bpf_jit_limitand we’re hoping for a quick fix.@dougbaber thanks! I’ll make sure the ECS team is aware of this issue; any users of recent 5.10 kernel builds would be impacted.
After backporting a1140cb215fa (“seccomp: Move copy_seccomp() to no failure path.”) to our 5.10 kernel, I didn’t see any memleak with @essh 's repro. I will release a new kernel with the commit and post the backport patch to the upstream 5.10 tree as well.
v20230501is available in all regions now! Update to the latest EKS Optimized AMIs and this issue should be resolved.The same problem for this setup:
On:
works great.
@stevo-f3 This should do it:
At present, we have more users needing 5.10 who are not experiencing this leak than those who are; downgrading the official build to 5.4 would be a last resort if we can’t put a fix together.
@borkmann ACK on behalf of @cartermckinnon. please give us some time to do things…
5.4 kernel would not be affected as it does not seem to have the offending commit 3a15fb6ed92c (“seccomp: release filter after task is fully dead”) which a1140cb215fa (“seccomp: Move copy_seccomp() to no failure path.”) fixes.
Looks like potentially missing kernel commit in seccomp causing this issue: a1140cb215fa (“seccomp: Move copy_seccomp() to no failure path.”) (via https://lore.kernel.org/bpf/20230321170925.74358-1-kuniyu@amazon.com/)
Is the kernel fix actually fixing the bug for good or is it just bumping the default bpf jit memory limit? can you provide a link to the patch?
The kernel fix seems to have been released now, as 5.10.178-162.673.amzn2.x86_64.
I’ve switched to Bottlerocket