containerd: Containers slow to start after init container finishes: `failed to handle container TaskExit event`

Description

Note that this only is a problem when the node has just come up. After running for a few minutes or so, everything is fine.

I noticed that our aws-cni pods were taking a long time to run after starting. The init container runs and complete within a second, but the main container sometimes takes over a minute before it began execution.

Looking at the kubelet logs, I saw the Relisting method (which usually runs once per second) take well over 30

kubelet_init[1680]: I0602 05:48:06.350695    1679 generic.go:191] "GenericPLEG: Relisting"
kubelet_init[1680]: I0602 05:48:07.352436    1679 generic.go:191] "GenericPLEG: Relisting"
kubelet_init[1680]: I0602 05:48:49.669827    1679 generic.go:191] "GenericPLEG: Relisting"
kubelet_init[1680]: I0602 05:48:50.672822    1679 generic.go:191] "GenericPLEG: Relisting"
kubelet_init[1680]: I0602 05:48:51.676857    1679 generic.go:191] "GenericPLEG: Relisting"

This, to the best of my knowledge calls the CRI of containerd for a list of all the running pods.

Looking into containerd’s logs I saw signs that containerd or the shim was having trouble stopping the container

containerd[2206]: {"error":"failed to stop container: failed to delete task: context deadline exceeded:
 unknown","level":"error","msg":"failed to handle container TaskExit event \u0026TaskExit{ContainerID:8f71eadd772ffde7f9537998844aa3f7a8a4f15ec182a5943ef0d60f242b1929,ID:8f71eadd772ffde7f9537998844aa3f7a8a4f15ec182a5943ef0d60f242b1929,Pid:3658,ExitStatus:0,ExitedAt:2023-06-15 06:39:36.566238047 +0000 UTC,XXX_unrecognized:[],}","time":"2023-06-15T06:39:46.568049942Z"}

In the goroutine dump, I can see (L677) that a goroutine is waiting on the the Delete call to dispatch.

containerd-goroutines

In the containerd-shim-runc-v2 goroutines, it appears to be hanging on the umount syscall, not sure though.

goroutine 104 [syscall]:
syscall.Syscall(0x7c?, 0xc000161800?, 0xc0000c2200?, 0x7d?)
        /usr/local/go/src/syscall/syscall_linux.go:68 +0x27
golang.org/x/sys/unix.Unmount({0xc0000c2200?, 0x0?}, 0xc00029c448?)
        /go/src/github.com/containerd/containerd/vendor/golang.org/x/sys/unix/zsyscall_linux.go:1680 +0x7d
github.com/containerd/containerd/mount.unmount({0xc0000c2200, 0x7c}, 0x9ed280?)
        /go/src/github.com/containerd/containerd/mount/mount_linux.go:155 +0x8c
github.com/containerd/containerd/mount.UnmountAll({0xc0000c2200, 0x7c}, 0xc000065080?)
        /go/src/github.com/containerd/containerd/mount/mount_linux.go:185 +0x8c
github.com/containerd/containerd/pkg/process.(*Init).delete(0xc0000d2c60, {0x9f1460, 0xc000065080})
        /go/src/github.com/containerd/containerd/pkg/process/init.go:312 +0x13a
github.com/containerd/containerd/pkg/process.(*stoppedState).Delete(0xc000012ab8, {0x9f1460?, 0xc000065080?})
        /go/src/github.com/containerd/containerd/pkg/process/init_state.go:396 +0x2e
github.com/containerd/containerd/pkg/process.(*Init).Delete(0xc00002d180?, {0x9f1460?, 0xc000065080?})
        /go/src/github.com/containerd/containerd/pkg/process/init.go:288 +0xb6
github.com/containerd/containerd/runtime/v2/runc.(*Container).Delete(0xc000132380?, {0x9f1460, 0xc000065080}, 0xc0003071c0)
        /go/src/github.com/containerd/containerd/runtime/v2/runc/container.go:390 +0x62
github.com/containerd/containerd/runtime/v2/runc/task.(*service).Delete(0xc000132380, {0x9f1460, 0xc000065080}, 0xc0003071c0)
        /go/src/github.com/containerd/containerd/runtime/v2/runc/task/service.go:216 +0x75
github.com/containerd/containerd/runtime/v2/task.RegisterTaskService.func4({0x9f1460, 0xc000065080}, 0xc00002ee00)
        /go/src/github.com/containerd/containerd/runtime/v2/task/shim.pb.go:3477 +0x98
github.com/containerd/ttrpc.defaultServerInterceptor({0x9f1460?, 0xc000065080?}, 0x17?, 0xc000134760?, 0x6?)
        /go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/interceptor.go:45 +0x26
github.com/containerd/ttrpc.(*serviceSet).dispatch(0xc000118650, {0x9f1460, 0xc000065080}, {0xc00002aba0, 0x17}, {0xc00027e22a, 0x6}, {0xc000180280, 0x42, 0x50})
        /go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/services.go:95 +0x1be
github.com/containerd/ttrpc.(*serviceSet).call(0xd0e330?, {0x9f1460?, 0xc000065080?}, {0xc00002aba0?, 0x84eac0?}, {0xc00027e22a?, 0xc000308000?}, {0xc000180280, 0x42, 0x50})
        /go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/services.go:64 +0x71
github.com/containerd/ttrpc.(*serverConn).run.func2(0x1b)
        /go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/server.go:439 +0xe5
created by github.com/containerd/ttrpc.(*serverConn).run
        /go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/server.go:435 +0x6b1

shim-goroutines (each shim goroutine is seperated by a BREAK)

Steps to reproduce the issue

Unfortunately, I don’t know how, but this occurs across all nodes. I’ve tried different versions of runc, using systemd over cgroupfs, same issue.

We were also restarting containerd while booting (as we reconfigure some things at boot time), but even after removing the reboot, same behaviour.

Describe the results you received and expected

The main container should start pretty much straight after the init container finishes.

Kubelet PLEG relisting should not block for too long, nor should I get.

E0602 05:48:48.669179    1679 kubelet.go:2190] "Housekeeping took longer than 15s" err="housekeeping took too long" seconds=41.340653962

What version of containerd are you using?

containerd github.com/containerd/containerd v1.6.21 3dce8eb055cbb6872793272b4f20ed16117344f8

Any other relevant information

runc version 1.1.7
commit: v1.1.7-0-g860f061b
spec: 1.0.2-dev
go: go1.20.3
libseccomp: 2.5.4

5.15.0-1037-aws #41~20.04.1-Ubuntu

Let me know what details you want from crictl info and I can provide (don’t want to paste entire thing)

Show configuration if it is related to CRI plugin.

(I have used multiple configurations, below is the EKS config)

version = 2
root = "/var/lib/containerd"
state = "/run/containerd"

[grpc]
address = "/run/containerd/containerd.sock"

[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"

[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"

About this issue

Original URL
State: open
Created a year ago
Comments: 17 (8 by maintainers)

Links to this issue

Bingo of the Kubernetes problems I found myself debugging over the past weeks. AMA :p : kubernetes

Most upvoted comments

Woo, I can confirm that the patch + configuring with volatile improves node boot up by quite a bit! Looking forward to seeing this in an upcoming release 😃

alam0rt on Jun 19, 2023

chatting offline with @fuweid we are considering different locking / cache mechanisms to reduce (or eliminate) contention between read/list operations that could be blocked by update/change operations via bbolt (etcd) lock which will be delayed under storage pressure.

mikebrow on Jun 16, 2023

Would it make sense to always enable volatile on overlay if the pod has a read only rootfs?

In kubernetes, if the container fails, kubelet always recreates a one instead of restarting it. So the rootfs of container in kubernetes world is temporary. It’s safe to use volatile option for pod since we don’t have a chance to reuse old rootfs. For the read only container, the rootfs will be mounted with read-only so the kernel won’t force syncfs during umount.

fuweid on Jun 20, 2023

Checked the containerd.log and found that.

There is UpdateContainerResources request. It was waiting for bbolt lock.

https://github.com/containerd/containerd/blob/f9a7ae6da2fe97f4f5b1444bed8054103783c587/pkg/cri/server/container_update_resources.go#L40-L53

goroutine 1723 [semacquire]:
sync.runtime_SemacquireMutex(0xc001a78300?, 0x9?, 0xc001500c00?)
	/usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*Mutex).lockSlow(0xc0002f9178)
	/usr/local/go/src/sync/mutex.go:171 +0x165
sync.(*Mutex).Lock(...)
	/usr/local/go/src/sync/mutex.go:90
go.etcd.io/bbolt.(*DB).beginRWTx(0xc0002f8fc0)
	/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/db.go:640 +0x77
go.etcd.io/bbolt.(*DB).Begin(0xc001739a70?, 0x87?)
	/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/db.go:589 +0x1d
go.etcd.io/bbolt.(*DB).Update(0x20?, 0xc000b44be0)
	/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/db.go:725 +0x3e
github.com/containerd/containerd/metadata.(*DB).Update(0xc0001e92d0, 0x56190174ac40?)
	/go/src/github.com/containerd/containerd/metadata/db.go:246 +0x9e
github.com/containerd/containerd/services/containers.(*local).withStoreUpdate(0xc0003e8840, {0x56190174ac40?, 0xc0017390b0?}, 0x0?)
	/go/src/github.com/containerd/containerd/services/containers/local.go:212 +0x3c
github.com/containerd/containerd/services/containers.(*local).Update(0xc0003e8840, {0x56190174ac40, 0xc0017390b0}, 0xc000eb0540, {0xedc1ca928?, 0x0?, 0xc0017399b0?})
	/go/src/github.com/containerd/containerd/services/containers/local.go:156 +0x24e
github.com/containerd/containerd.(*remoteContainers).Update(_, {_, _}, {{0xc00136b280, 0x40}, 0xc0017398f0, {0xc0016cc280, 0x96}, {{0xc00145a9f0, 0x15}, ...}, ...}, ...)
	/go/src/github.com/containerd/containerd/containerstore.go:129 +0x2ce
github.com/containerd/containerd.(*container).Update(0xc0006e4a80, {0x56190174ac40, 0xc0017390b0}, {0xc00040a0f0, 0x1, 0xc0016f0101?})
	/go/src/github.com/containerd/containerd/container.go:318 +0x214
github.com/containerd/containerd/pkg/cri/server.updateContainerSpec({0x56190174ac40, 0xc0017390b0}, {0x561901754a58, 0xc0006e4a80}, 0xc0016f4b80)

But the UpdateContainerResources holds the lock of container’s status, which is required by the ListContainers. It will cause PLEG takes long time.

goroutine 1665 [semacquire]:
sync.runtime_SemacquireMutex(0xe69?, 0xae?, 0x1768c1d3f53d5f79?)
	/usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).RLock(...)
	/usr/local/go/src/sync/rwmutex.go:71
github.com/containerd/containerd/pkg/cri/store/container.(*statusStorage).Get(0xc00058a880?)
	/go/src/github.com/containerd/containerd/pkg/cri/store/container/status.go:205 +0x9c
github.com/containerd/containerd/pkg/cri/server.toCRIContainer({{{0xc00136b280, 0x40}, {0xc00186b6d0, 0x4d}, {0xc001a8e6c0, 0x40}, 0xc0007129a0, {0xc00050eb40, 0x47}, {0xc000dfcd80, ...}, ...}, ...})
	/go/src/github.com/containerd/containerd/pkg/cri/server/container_list.go:48 +0x3f
github.com/containerd/containerd/pkg/cri/server.(*criService).ListContainers(0xc000548400, {0xc0006027b0?, 0x561900e89376?}, 0xc000ef6130)
	/go/src/github.com/containerd/containerd/pkg/cri/server/container_list.go:37 +0x205
github.com/containerd/containerd/pkg/cri/server.(*instrumentedService).ListContainers(0xc000708168, {0x56190174ac40, 0xc0006027b0}, 0xc000ef6130)
	/go/src/github.com/containerd/containerd/pkg/cri/server/instrumented_service.go:515 +0x1d3

And the bbolt db, which holds the lock required by UpdateContainerResource, is trying to sync the data into storage. I think the host is under IO pressure.

goroutine 1268 [syscall]:
syscall.Syscall(0x0?, 0x0?, 0xc000984000?, 0x686?)
	/usr/local/go/src/syscall/syscall_linux.go:68 +0x27
syscall.Fsync(0x0?)
	/usr/local/go/src/syscall/zsyscall_linux_amd64.go:423 +0x2a
internal/poll.(*FD).Fsync.func1(...)
	/usr/local/go/src/internal/poll/fd_fsync_posix.go:18
internal/poll.ignoringEINTR(...)
	/usr/local/go/src/internal/poll/fd_posix.go:74
internal/poll.(*FD).Fsync(0x0?)
	/usr/local/go/src/internal/poll/fd_fsync_posix.go:17 +0xf4
os.(*File).Sync(0xc00011a6b8)
	/usr/local/go/src/os/file_posix.go:168 +0x4e
github.com/containerd/containerd/content/local.(*writer).Commit(0xc0008c7880, {0x56190174ac40, 0xc0006028a0}, 0x181ebe95, {0xc001476780, 0x47}, {0x0, 0x0, 0x5619023eb940?})
	/go/src/github.com/containerd/containerd/content/local/writer.go:95 +0x24e
github.com/containerd/containerd/metadata.(*namespacedWriter).commit(0xc0006f5320, {0x56190174ac40, 0xc0006028a0}, 0x0?, 0x181ebe95, {0xc001476780, 0x47}, {0x0, 0x0, 0x0})
	/go/src/github.com/containerd/containerd/metadata/content.go:641 +0x5bf
github.com/containerd/containerd/metadata.(*namespacedWriter).Commit.func1(0x0?)
	/go/src/github.com/containerd/containerd/metadata/content.go:578 +0x8f
go.etcd.io/bbolt.(*DB).Update(0x0?, 0xc000bb01e0)
	/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/db.go:741 +0x82
github.com/containerd/containerd/metadata.(*DB).Update(0xc0001e92d0, 0x5619014f3000?)
	/go/src/github.com/containerd/containerd/metadata/db.go:246 +0x9e
github.com/containerd/containerd/metadata.update({0x56190174ac40?, 0xc0006028a0?}, {0x561901738b90, 0xc0001e92d0}, 0xc000bb01e0)

fuweid on Jun 16, 2023