containerd: Containers slow to start after init container finishes: `failed to handle container TaskExit event`
Description
Note that this only is a problem when the node has just come up. After running for a few minutes or so, everything is fine.
I noticed that our aws-cni pods were taking a long time to run after starting. The init container runs and complete within a second, but the main container sometimes takes over a minute before it began execution.
Looking at the kubelet logs, I saw the Relisting
method (which usually runs once per second) take well over 30
kubelet_init[1680]: I0602 05:48:06.350695 1679 generic.go:191] "GenericPLEG: Relisting"
kubelet_init[1680]: I0602 05:48:07.352436 1679 generic.go:191] "GenericPLEG: Relisting"
kubelet_init[1680]: I0602 05:48:49.669827 1679 generic.go:191] "GenericPLEG: Relisting"
kubelet_init[1680]: I0602 05:48:50.672822 1679 generic.go:191] "GenericPLEG: Relisting"
kubelet_init[1680]: I0602 05:48:51.676857 1679 generic.go:191] "GenericPLEG: Relisting"
This, to the best of my knowledge calls the CRI of containerd for a list of all the running pods.
Looking into containerd’s logs I saw signs that containerd or the shim was having trouble stopping the container
containerd[2206]: {"error":"failed to stop container: failed to delete task: context deadline exceeded:
unknown","level":"error","msg":"failed to handle container TaskExit event \u0026TaskExit{ContainerID:8f71eadd772ffde7f9537998844aa3f7a8a4f15ec182a5943ef0d60f242b1929,ID:8f71eadd772ffde7f9537998844aa3f7a8a4f15ec182a5943ef0d60f242b1929,Pid:3658,ExitStatus:0,ExitedAt:2023-06-15 06:39:36.566238047 +0000 UTC,XXX_unrecognized:[],}","time":"2023-06-15T06:39:46.568049942Z"}
In the goroutine dump, I can see (L677) that a goroutine is waiting on the the Delete call to dispatch.
In the containerd-shim-runc-v2
goroutines, it appears to be hanging on the umount
syscall, not sure though.
goroutine 104 [syscall]:
syscall.Syscall(0x7c?, 0xc000161800?, 0xc0000c2200?, 0x7d?)
/usr/local/go/src/syscall/syscall_linux.go:68 +0x27
golang.org/x/sys/unix.Unmount({0xc0000c2200?, 0x0?}, 0xc00029c448?)
/go/src/github.com/containerd/containerd/vendor/golang.org/x/sys/unix/zsyscall_linux.go:1680 +0x7d
github.com/containerd/containerd/mount.unmount({0xc0000c2200, 0x7c}, 0x9ed280?)
/go/src/github.com/containerd/containerd/mount/mount_linux.go:155 +0x8c
github.com/containerd/containerd/mount.UnmountAll({0xc0000c2200, 0x7c}, 0xc000065080?)
/go/src/github.com/containerd/containerd/mount/mount_linux.go:185 +0x8c
github.com/containerd/containerd/pkg/process.(*Init).delete(0xc0000d2c60, {0x9f1460, 0xc000065080})
/go/src/github.com/containerd/containerd/pkg/process/init.go:312 +0x13a
github.com/containerd/containerd/pkg/process.(*stoppedState).Delete(0xc000012ab8, {0x9f1460?, 0xc000065080?})
/go/src/github.com/containerd/containerd/pkg/process/init_state.go:396 +0x2e
github.com/containerd/containerd/pkg/process.(*Init).Delete(0xc00002d180?, {0x9f1460?, 0xc000065080?})
/go/src/github.com/containerd/containerd/pkg/process/init.go:288 +0xb6
github.com/containerd/containerd/runtime/v2/runc.(*Container).Delete(0xc000132380?, {0x9f1460, 0xc000065080}, 0xc0003071c0)
/go/src/github.com/containerd/containerd/runtime/v2/runc/container.go:390 +0x62
github.com/containerd/containerd/runtime/v2/runc/task.(*service).Delete(0xc000132380, {0x9f1460, 0xc000065080}, 0xc0003071c0)
/go/src/github.com/containerd/containerd/runtime/v2/runc/task/service.go:216 +0x75
github.com/containerd/containerd/runtime/v2/task.RegisterTaskService.func4({0x9f1460, 0xc000065080}, 0xc00002ee00)
/go/src/github.com/containerd/containerd/runtime/v2/task/shim.pb.go:3477 +0x98
github.com/containerd/ttrpc.defaultServerInterceptor({0x9f1460?, 0xc000065080?}, 0x17?, 0xc000134760?, 0x6?)
/go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/interceptor.go:45 +0x26
github.com/containerd/ttrpc.(*serviceSet).dispatch(0xc000118650, {0x9f1460, 0xc000065080}, {0xc00002aba0, 0x17}, {0xc00027e22a, 0x6}, {0xc000180280, 0x42, 0x50})
/go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/services.go:95 +0x1be
github.com/containerd/ttrpc.(*serviceSet).call(0xd0e330?, {0x9f1460?, 0xc000065080?}, {0xc00002aba0?, 0x84eac0?}, {0xc00027e22a?, 0xc000308000?}, {0xc000180280, 0x42, 0x50})
/go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/services.go:64 +0x71
github.com/containerd/ttrpc.(*serverConn).run.func2(0x1b)
/go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/server.go:439 +0xe5
created by github.com/containerd/ttrpc.(*serverConn).run
/go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/server.go:435 +0x6b1
shim-goroutines (each shim goroutine is seperated by a BREAK
)
Steps to reproduce the issue
Unfortunately, I don’t know how, but this occurs across all nodes. I’ve tried different versions of runc, using systemd over cgroupfs, same issue.
We were also restarting containerd while booting (as we reconfigure some things at boot time), but even after removing the reboot, same behaviour.
Describe the results you received and expected
The main container should start pretty much straight after the init container finishes.
Kubelet PLEG relisting should not block for too long, nor should I get.
E0602 05:48:48.669179 1679 kubelet.go:2190] "Housekeeping took longer than 15s" err="housekeeping took too long" seconds=41.340653962
What version of containerd are you using?
containerd github.com/containerd/containerd v1.6.21 3dce8eb055cbb6872793272b4f20ed16117344f8
Any other relevant information
runc version 1.1.7
commit: v1.1.7-0-g860f061b
spec: 1.0.2-dev
go: go1.20.3
libseccomp: 2.5.4
5.15.0-1037-aws #41~20.04.1-Ubuntu
Let me know what details you want from crictl info and I can provide (don’t want to paste entire thing)
Show configuration if it is related to CRI plugin.
(I have used multiple configurations, below is the EKS config)
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
[grpc]
address = "/run/containerd/containerd.sock"
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 17 (8 by maintainers)
Woo, I can confirm that the patch + configuring with volatile improves node boot up by quite a bit! Looking forward to seeing this in an upcoming release 😃
chatting offline with @fuweid we are considering different locking / cache mechanisms to reduce (or eliminate) contention between read/list operations that could be blocked by update/change operations via bbolt (etcd) lock which will be delayed under storage pressure.
In kubernetes, if the container fails, kubelet always recreates a one instead of restarting it. So the rootfs of container in kubernetes world is temporary. It’s safe to use volatile option for pod since we don’t have a chance to reuse old rootfs. For the read only container, the rootfs will be mounted with read-only so the kernel won’t force syncfs during umount.
Checked the containerd.log and found that.
There is
UpdateContainerResources
request. It was waiting for bbolt lock.https://github.com/containerd/containerd/blob/f9a7ae6da2fe97f4f5b1444bed8054103783c587/pkg/cri/server/container_update_resources.go#L40-L53
But the
UpdateContainerResources
holds the lock of container’s status, which is required by theListContainers
. It will causePLEG
takes long time.And the bbolt db, which holds the lock required by
UpdateContainerResource
, is trying to sync the data into storage. I think the host is under IO pressure.