linuxkit: boot hang on kernels >= 4.14.36
Starting with kernel 4.14.36, I can no longer reach the login prompt when booting an image with docker and getty. The kernel reaches init, but gets stuck trying to start containerd.
I’ve been testing this using the kernel+initrd build target, and then loading those images under qemu on both a local kvm instance, and one in the cloud. It fails to reach login in both cases.
After putting getty into the init section, I’m able to look at containerd. It has a thread blocked in the kernel with this stack:
[<0000000000000000>] wait_for_random_bytes+0x8c/0xbb
[<0000000000000000>] SyS_getrandom+0x68/0x83
[<0000000000000000>] do_syscall_64+0x74/0x87
[<0000000000000000>] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[<0000000000000000>] 0xffffffffffffffff
Which corresponds to the following goroutine:
goroutine 29 [syscall, 1 minutes]:
syscall.Syscall(0x13e, 0xc4201e20e9, 0x3, 0x0, 0x3, 0x3, 0x0)
/usr/lib/go/src/syscall/asm_linux_amd64.s:18 +0x5
internal/syscall/unix.GetRandom(0xc4201e20e9, 0x3, 0x3, 0x0, 0x0, 0x0, 0x0)
/usr/lib/go/src/internal/syscall/unix/getrandom_linux.go:38 +0x6f
crypto/rand.getRandomLinux(0xc4201e20e9, 0x3, 0x3, 0x7fa9190cd9e3)
/usr/lib/go/src/crypto/rand/rand_linux.go:23 +0x4a
crypto/rand.(*devReader).Read(0xc420074180, 0xc4201e20e9, 0x3, 0x3, 0x0, 0x0, 0x0)
/usr/lib/go/src/crypto/rand/rand_unix.go:48 +0x3d3
io.ReadAtLeast(0x7fa91a072fe0, 0xc420074180, 0xc4201e20e9, 0x3, 0x3, 0x3, 0xc42020c380, 0x7fa919a3dd90, 0x0)
/usr/lib/go/src/io/io.go:309 +0x88
io.ReadFull(0x7fa91a072fe0, 0xc420074180, 0xc4201e20e9, 0x3, 0x3, 0x7fa919baa5c0, 0xf8c2d01, 0xc4201e20e9)
/usr/lib/go/src/io/io.go:327 +0x5a
crypto/rand.Read(0xc4201e20e9, 0x3, 0x3, 0xffc5e8bfa08ef1b9, 0xc42004d7f8, 0x3)
/usr/lib/go/src/crypto/rand/rand.go:23 +0x59
github.com/containerd/containerd/services/leases.generateLeaseID(0xc42004d818, 0xc42004d728)
/go/src/github.com/containerd/containerd/services/leases/service.go:113 +0x88
github.com/containerd/containerd/services/leases.(*service).Create(0xc42019e090, 0x7fa919941200, 0xc4201d2a80, 0xc4201e03a0, 0xc42019e090, 0x7fa9197f6b80, 0x6)
/go/src/github.com/containerd/containerd/services/leases/service.go:55 +0x298
github.com/containerd/containerd/api/services/leases/v1._Leases_Create_Handler.func1(0x7fa919941200, 0xc4201d2a80, 0x7fa919c581e0, 0xc4201e03a0, 0xc4201d07d0, 0x7fa91a0de618, 0x7fa919b9d300, 0xc4201dc1c0)
/go/src/github.com/containerd/containerd/api/services/leases/v1/leases.pb.go:208 +0x88
github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.UnaryServerInterceptor(0x7fa919941200, 0xc4201d2a80, 0x7fa919c581e0, 0xc4201e03a0, 0xc4201e03c0, 0xc4201e03e0, 0x50, 0x48, 0xc4201e0380, 0xc42004d968)
/go/src/github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server.go:29 +0xd4
github.com/containerd/containerd/server.interceptor(0x7fa919941200, 0xc4201d29c0, 0x7fa919c581e0, 0xc4201e03a0, 0xc4201e03c0, 0xc4201e03e0, 0x0, 0xc42004d9e0, 0x7fa9190ce14a, 0x50)
/go/src/github.com/containerd/containerd/server/server.go:267 +0x1d2
github.com/containerd/containerd/api/services/leases/v1._Leases_Create_Handler(0x7fa919c0dd60, 0xc42019e090, 0x7fa919941200, 0xc4203a1980, 0xc4201d06e0, 0x7fa919cc5e40, 0x0, 0x0, 0x0, 0x0)
/go/src/github.com/containerd/containerd/api/services/leases/v1/leases.pb.go:210 +0x16f
github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc4201f8500, 0x7fa91a0837e0, 0xc4203fe160, 0xc4201ae800, 0xc420297110, 0x7fa91a064c60, 0x0, 0x0, 0x0)
/go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:826 +0xab6
github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).handleStream(0xc4201f8500, 0x7fa91a0837e0, 0xc4203fe160, 0xc4201ae800, 0x0)
/go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:1023 +0x152a
github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc4201e2130, 0xc4201f8500, 0x7fa91a0837e0, 0xc4203fe160, 0xc4201ae800)
/go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:572 +0xa1
created by github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
/go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:570 +0xa3
The problem is this bit of code:
SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
unsigned int, flags)
{
int ret;
if (flags & ~(GRND_NONBLOCK|GRND_RANDOM))
return -EINVAL;
if (count > INT_MAX)
count = INT_MAX;
if (flags & GRND_RANDOM)
return _random_read(flags & GRND_NONBLOCK, buf, count);
if (!crng_ready()) {
if (flags & GRND_NONBLOCK)
return -EAGAIN;
!# ===> ret = wait_for_random_bytes();
if (unlikely(ret))
return ret;
}
return urandom_read(NULL, buf, count, NULL);
}
The crng_ready()
check is returning false, and since the getrandom syscall was not invoked with GRND_NONBLOCK
, we’re waiting for the crng to become ready. Bummer.
The problem is caused by this patch, which arrived as a security fix in 4.14.36: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.14.40&id=6e513bc20ca63f594632eca4e1968791240b8f18
It changes the crng_ready()
check not to return true until the CRNG is fully initialized, which means that in our low-entropy early boot stage we’re blocking until we have enough entropy to proceed.
As a workaround, I added a oneshot call to the rngd to the onboot section:
- name: rng-oneshot
image: linuxkit/rngd:429e1308b8cad9dbe04b7a91fcebec17ee6f7591
command: ["/sbin/rngd", "-1"]
This adds enough entropy to the pool during startup such that containerd no longer gets stuck trying to generateLeaseID()
. This is enough to get me to login.
About this issue
- Original URL
- State: open
- Created 6 years ago
- Comments: 25 (19 by maintainers)
rngd
has a oneshot moderngd -1
which may help in onboot if you have a source of entropy eg amd64 withRDRAND
, this may allow the machine to boot sanely. You can try adding this to the start ofonboot
: