kubernetes: Kernel panic when having a privileged container with docker >= 1.10

Hi,

I’m using a privileged container in a kubernetes pod to build images. The container runs docker 1.10.3. I’m using kubernetes 1.2.4 on AWS (setup with kube-up).

From time to time, a node crashes. Here is the output of the last crash at the end.

It seems this is the bug, and maybe it’s related to using docker >= 1.10 on debian jessie kernel (although it is not confirmed) as reported here: https://github.com/docker/docker/issues/21081

If this is the case, THIS PROBABLY AFFECTS kubernentes 1.3 that is due to be released.

cc @justinsb

[   82.728265] aufs au_opts_verify:1570:docker[1654]: dirperm1 breaks the protection by the permission bits on the lower branch
[   82.760820] aufs au_opts_verify:1570:docker[1635]: dirperm1 breaks the protection by the permission bits on the lower branch
[   82.896108] aufs au_opts_verify:1570:docker[1654]: dirperm1 breaks the protection by the permission bits on the lower branch
[   82.928699] aufs au_opts_verify:1570:docker[1654]: dirperm1 breaks the protection by the permission bits on the lower branch
[   82.992993] aufs au_opts_verify:1570:docker[1673]: dirperm1 breaks the protection by the permission bits on the lower branch
[   83.385415] aufs au_opts_verify:1570:docker[1691]: dirperm1 breaks the protection by the permission bits on the lower branch
[   83.480134] aufs au_opts_verify:1570:docker[1691]: dirperm1 breaks the protection by the permission bits on the lower branch
[   83.592429] aufs au_opts_verify:1570:docker[1744]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.002341] aufs au_opts_verify:1570:docker[1689]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.083000] aufs au_opts_verify:1570:docker[1516]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.140267] aufs au_opts_verify:1570:docker[1516]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.219145] aufs au_opts_verify:1570:docker[1801]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.252038] aufs au_opts_verify:1570:docker[1801]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.293019] aufs au_opts_verify:1570:docker[1805]: dirperm1 breaks the protection by the permission bits on the lower branch
[   84.581778] aufs au_warn_loopback:122:loop1[1857]: you may want to try another patch for loopback file on ext4(0xef53) branch
[   84.603270] divide error: 0000 [#1] SMP 
[   84.604057] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c xt_statistic xt_nat xt_mark ipt_REJECT xt_tcpudp xt_comment loop veth binfmt_misc sch_htb ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables x_tables nf_nat nf_conntrack bridge stp llc aufs(C) nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc crc32_pclmul ppdev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd evdev psmouse serio_raw parport_pc ttm parport drm_kms_helper drm i2c_piix4 i2c_core processor button thermal_sys autofs4 ext4 crc16 mbcache jbd2 btrfs xor raid6_pq dm_mod ata_generic crct10dif_pclmul crct10dif_common xen_netfront xen_blkfront crc32c_intel ata_piix libata scsi_mod floppy
[   84.609355] CPU: 1 PID: 1853 Comm: docker Tainted: G         C    3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u4
[   84.609355] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/12/2016
[   84.609355] task: ffff8801e3657470 ti: ffff8801e47a8000 task.ti: ffff8801e47a8000
[   84.609355] RIP: 0010:[<ffffffffa0577200>]  [<ffffffffa0577200>] pool_io_hints+0xf0/0x1a0 [dm_thin_pool]
[   84.609355] RSP: 0018:ffff8801e47abbc8  EFLAGS: 00010246
[   84.609355] RAX: 0000000000010000 RBX: ffff8801e4736840 RCX: ffff8801c2662000
[   84.609355] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8801e48c4080
[   84.609355] RBP: ffff8801e47abc10 R08: 0000000000000000 R09: 0000000000000000
[   84.609355] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffa057f5c8
[   84.609355] R13: 0000000000000001 R14: ffff8801e47abc90 R15: 0000000000000131
[   84.609355] FS:  00007ff465daf700(0000) GS:ffff8801efc20000(0000) knlGS:0000000000000000
[   84.609355] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.609355] CR2: 000000c207f1c3fb CR3: 00000001e2a5a000 CR4: 00000000001406e0
[   84.609355] Stack:
[   84.609355]  ffffffff810a7c71 0000000043e06d70 ffffc9000115f040 0000000000000000
[   84.609355]  0000000043e06d70 ffffc9000115f040 0000000000000000 ffff8800e9da3800
[   84.609355]  ffffffffa00ba615 000fffffffffffff 00000000ffffffff 00000000000000ff
[   84.609355] Call Trace:
[   84.609355]  [<ffffffff810a7c71>] ? complete+0x31/0x40
[   84.609355]  [<ffffffffa00ba615>] ? dm_calculate_queue_limits+0x95/0x130 [dm_mod]
[   84.609355]  [<ffffffffa00b7ec3>] ? dm_swap_table+0x73/0x320 [dm_mod]
[   84.609355]  [<ffffffffa00b0101>] ? crc_t10dif_generic+0x101/0x1000 [crct10dif_common]
[   84.609355]  [<ffffffffa00bd0d0>] ? table_load+0x330/0x330 [dm_mod]
[   84.609355]  [<ffffffffa00bd165>] ? dev_suspend+0x95/0x220 [dm_mod]
[   84.609355]  [<ffffffffa00bda55>] ? ctl_ioctl+0x205/0x430 [dm_mod]
[   84.609355]  [<ffffffffa00bdc8f>] ? dm_ctl_ioctl+0xf/0x20 [dm_mod]
[   84.609355]  [<ffffffff811ba99f>] ? do_vfs_ioctl+0x2cf/0x4b0
[   84.609355]  [<ffffffff810d485e>] ? SyS_futex+0x6e/0x150
[   84.609355]  [<ffffffff811bac01>] ? SyS_ioctl+0x81/0xa0
[   84.609355]  [<ffffffff81513ecd>] ? system_call_fast_compare_end+0x10/0x15
[   84.609355] Code: 0f 84 a5 00 00 00 3b 96 10 06 00 00 49 c7 c4 c8 f5 57 a0 77 26 8b b6 18 06 00 00 89 d0 c1 e0 09 48 39 f0 0f 82 92 00 00 00 31 d2 <48> f7 f6 85 d2 74 2d 49 c7 c4 70 f5 57 a0 66 90 48 89 e6 e8 28 
[   84.609355] RIP  [<ffffffffa0577200>] pool_io_hints+0xf0/0x1a0 [dm_thin_pool]
[   84.609355]  RSP <ffff8801e47abbc8>
[   84.770467] ---[ end trace fcce781faebae9ce ]---
[   84.773018] Kernel panic - not syncing: Fatal exception
[   84.775963] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[    6.096097] xenbus_probe_frontend: Waiting for devices to initialise: 25s...20s...15s...
[   17.402123] reboot: Failed to start orderly shutdown: forcing the issue
[   17.407629] xenbus: xenbus_dev_shutdown: device/vif/0: Initialising != Connected, skipping
[   17.412875] xenbus: xenbus_dev_shutdown: device/vbd/51744: Initialising != Connected, skipping
[   17.417585] xenbus: xenbus_dev_shutdown: device/vbd/51712: Initialising != Connected, skipping
[   17.421263] xenbus: xenbus_dev_shutdown: device/vfb/0: Initialised != Connected, skipping
[   17.424839] ACPI: Preparing to enter system sleep state S5
[   17.427112] reboot: Power down

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 28 (23 by maintainers)

Most upvoted comments

On Fri, Aug 05, 2016 at 04:15:11AM -0700, Nugroho Herucahyono wrote:

we downgraded kubernetes to 1.2.6 but keep using docker 1.12, and the problem disappear.

so it’s kubernetes 1.3’s issue.

A kernel bug seems more like a kernel issue 😃

What kernel version? Can you upgrade your kernel and see if the issue persist with k8s 1.3?

@rata @dchen1107 @girishkalele FYI, the node problem detector v0.2 should be able to report the kernel panic to the control plane as event. This will at least surface the problem to the user.

See https://github.com/kubernetes/node-problem-detector/pull/22