kubernetes: Test failures caused by kernel NULL pointer dereference on debian-based CVM
Initially observed this in PR tests: https://github.com/kubernetes/kubernetes/pull/44326#issuecomment-299251676 Then I checked ci-kubernetes-e2e-gce-etcd3 and found that the suite failed ~3 times and day and many of them were caused by the kernel panic.
A few examples: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-etcd3/9104 https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-etcd3/9104/artifacts/bootstrap-e2e-minion-group-qcn2/serial-1.log
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-etcd3/9078 https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-etcd3/9078/artifacts/bootstrap-e2e-minion-group-wlm4/serial-1.log
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-etcd3/9068 https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-etcd3/9068/artifacts/bootstrap-e2e-minion-group-ml5v/serial-1.log
780.070508] aufs au_opts_verify:1570:docker[19344]: dirperm1 breaks the protection by the permission bits on the lower branch
May 4 02:50:24 bootstrap-e2e-minion-group-wlm4 kernel: [ 780.070508] aufs au_opts_verify:1570:docker[19344]: dirperm1 breaks the protection by the permission bits on the lower branch
[ 780.111216] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
[ 780.119476] IP: [<ffffffff810a1100>] check_preempt_wakeup+0xd0/0x1d0
[ 780.126101] PGD 214722067 PUD 20b376067 PMD 0
[ 780.131089] Oops: 0000 [#1] SMP
[ 780.134721] Modules linked in: sg nf_conntrack_netlink nfnetlink xt_statistic sch_htb ebt_ip ebtable_filter ebtables veth ipt_REJECT xt_nat xt_recent xt_mark xt_comment xt_tcpudp ipt_MASQUERADE iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs(C) nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel psmouse processor parport_pc i2c_piix4 pvpanic parport thermal_sys pcspkr serio_raw evdev i2c_core aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd virtio_net button ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_common virtio_scsi scsi_mod virtio_pci virtio virtio_ring
[ 780.210840] CPU: 0 PID: 31766 Comm: exe Tainted: G C 3.16.0-4-amd64 #1 Debian 3.16.39-1
[ 780.219923] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 780.229261] task: ffff880037bc2b60 ti: ffff880214f5c000 task.ti: ffff880214f5c000
[ 780.236875] RIP: 0010:[<ffffffff810a1100>] [<ffffffff810a1100>] check_preempt_wakeup+0xd0/0x1d0
[ 780.245919] RSP: 0018:ffff880214f5fe60 EFLAGS: 00010006
[ 780.251375] RAX: 0000000000000001 RBX: ffff88010053c040 RCX: 0000000000000008
[ 780.258632] RDX: 0000000000000001 RSI: ffff880212d20050 RDI: ffff88021fc12fb8
[ 780.265891] RBP: 0000000000000000 R08: ffffffff81610640 R09: 0000000000000001
[ 780.273149] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880037bc2b60
[ 780.280403] R13: ffff88021fc12f40 R14: 0000000000000000 R15: 0000000000000000
[ 780.287657] FS: 0000000002826880(0063) GS:ffff88021fc00000(0000) knlGS:0000000000000000
[ 780.295866] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 780.301732] CR2: 0000000000000078 CR3: 00000001a03f8000 CR4: 00000000001406f0
[ 780.308996] Stack:
[ 780.311128] 0000000000012f40 ffff88021fc12f40 0000000000012f40 ffff88021fc12f40
[ 780.319185] ffff880212d206d4 0000000000000246 ffff88020a8839c0 ffffffff81095bb5
[ 780.327242] ffff880212d20050 ffffffff8109869a 00007fffffffeffd 0000000000000000
[ 780.335351] Call Trace:
[ 780.337923] [<ffffffff81095bb5>] ? check_preempt_curr+0x85/0xa0
[ 780.344058] [<ffffffff8109869a>] ? wake_up_new_task+0xda/0x190
[ 780.350105] [<ffffffff81067a39>] ? do_fork+0x139/0x3d0
[ 780.355463] [<ffffffff8151b139>] ? stub_clone+0x69/0x90
[ 780.360910] [<ffffffff8151adcd>] ? system_call_fast_compare_end+0x10/0x15
[ 780.367910] Code: 39 c2 7d 27 0f 1f 80 00 00 00 00 83 e8 01 48 8b 5b 70 39 d0 75 f5 48 8b 7d 78 48 3b 7b 78 74 15 0f 1f 00 48 8b 6d 70 48 8b 5b 70 <48> 8b 7d 78 48 3b 7b 78 75 ee 48 85 ff 74 e9 e8 8c cb ff ff 48
[ 780.395815] RIP [<ffffffff810a1100>] check_preempt_wakeup+0xd0/0x1d0
[ 780.402514] RSP <ffff880214f5fe60>
[ 780.406122] CR2: 0000000000000078
[ 780.410418] ---[ end trace 8dfc3fa423bb7378 ]---
[ 780.415157] Kernel panic - not syncing: Fatal exception
[ 781.471619] Shutting down cpus with NMI
[ 781.476527] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[ 781.486828] Rebooting in 10 seconds..
[ 791.465975] ACPI MEMORY or I/O RESET_REG.
The nodes are debian-based CVM instances running docker 1.11.2. I am not sure if this is a known issue.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 29 (24 by maintainers)
It will be available on next m59 and m60 release this week. Stay tuned and release note could be found at: https://cloud.google.com/container-optimized-os/docs/release-notes