linuxkit: potential kernel hang on hyper-v

Several Docker for Windows users (10-15) have reported the MobyLinux VM not starting (https://github.com/docker/for-win/issues/54). All users are on Windows build 14393 or newer (anniversary edition), and this happened on Beta24-26, 1.12.0 and 1.12.1 stable. The symptoms are the same: The VM starts and is reported as running, but the host integration services (aka KVP store, heartbeat etc) are reported as not being active. The heartbeat is done by the kernel in the hv_util driver.

I managed to work with one user who had the issue both on 1.12.1 stable and beta26, to get a (partial) log from the serial console:

cpuidle: using governor menu
ACPI: bus type PCI registered
PCI: Fatal: No config space access function found
ACPI: Added _OSI(Module Device)
ACPI: Added _OSI(Processor Device)
ACPI: Added OSI(3.0 SCP Extensions)
ACPI: Added _OSI(Processor Aggregator Device)
ACPI: Executed 1 blocks of module-level executable AML code
ACPI: Dynamic OEM Table Load:
ACPI: OEM1 0x0000000000000000 00009E (v02 MSFTVM UARTS    00000001 MSFT 04000000)
ACPI: Interpreter enabled
ACPI: (supports S0 S5)
ACPI: Using IOAPIC for interrupt routing
PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
ACPI: Enabled 1 GPEs in block 00 to 0F
vgaarb: loaded
SCSI subsystem initialized
pps_core: LinuxPPS API ver. 1 registered
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
PTP clock support registered
wmi: Mapper loaded
clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
hv_vmbus: Hyper-V Host Build:14393-10.0-0-0.187; Vmbus version:4.0
PCI: Using ACPI for IRQ routing
PCI: System does not support PCI
amd_nb: Cannot enumerate AMD northbridges
clocksource: Switched to clocksource hyperv_clocksource_tsc_page
FS-Cache: Loaded
CacheFiles: Loaded
pnp: PnP ACPI init
pnp: PnP ACPI: found 3 devices
clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
NET: Registered protocol family 2
TCP established hash table entries: 16384 (order: 5, 131072 bytes)
TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
TCP: Hash tables configured (established 16384 bind 16384)
UDP hash table entries: 1024 (order: 3, 32768 bytes)
UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
Unpacking initramfs...
Freeing initrd memory: 56540K (ffff880077b34000 - ffff88007b26b000)
clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2828959bc9f, max_idle_ns: 440795271775 ns
futex hash table entries: 65536 (order: 10, 4194304 bytes)
HugeTLB registered 2 MB page size, pre-allocated 0 pages
FS-Cache: Netfs 'nfs' registered for caching
NFS: Registering the id_resolver key type
Key type id_resolver registered
Key type id_legacy registered
nfs4filelayout_init: NFSv4 File Layout Driver Registering...
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
FS-Cache: Netfs 'cifs' registered for caching
ntfs: driver 2.1.32 [Flags: R/O].
fuse init (API version 7.23)
9p: Installing v9fs 9p2000 file system support
FS-Cache: Netfs '9p' registered for caching
aufs 4.4-20160905
Key type big_key registered
NET: Registered protocol family 38
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
io scheduler noop registered
io scheduler deadline registered (default)
io scheduler cfq registered
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
pciehp: PCI Express Hot Plug Controller Driver version: 0.4
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
hv_vmbus: registering driver hyperv_fb
hyperv_fb: Screen resolution: 1152x864, Color depth: 32
Console: switching to colour frame buffer device 144x54
efifb: probing for efifb
efifb: framebuffer at 0xf8000000, mapped to 0xffffc90000400000, using 3072k, total 3072k
efifb: mode is 1024x768x32, linelength=4096, pages=1
efifb: scrolling: redraw
efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
fb1: EFI VGA frame buffer device
GHES: HEST is not enabled!
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
00:01: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
00:02: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
Initializing Nozomi driver 2.1d
Non-volatile memory driver v1.3
Hangcheck: starting hangcheck timer 0.9.1 (tick is 180 seconds, margin is 60 seconds).
[drm] Initialized drm 1.1.0 20060810
loop: module loaded
VMware PVSCSI driver - version 1.0.5.0-k
hv_vmbus: registering driver hv_storvsc
random: nonblocking pool is initialized

after this it just hangs…

A normal boot looks like this:

[...]
hv_vmbus: Hyper-V Host Build:14393-10.0-0-0.187; Vmbus version:4.0
[...]
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
00:01: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
00:02: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
Initializing Nozomi driver 2.1d
Non-volatile memory driver v1.3
Hangcheck: starting hangcheck timer 0.9.1 (tick is 180 seconds, margin is 60 seconds).
[drm] Initialized drm 1.1.0 20060810
loop: module loaded
VMware PVSCSI driver - version 1.0.5.0-k
hv_vmbus: registering driver hv_storvsc
scsi host0: storvsc_host_t
scsi 0:0:0:0: Direct-Access     Msft     Virtual Disk     1.0  PQ: 0 ANSI: 5
scsi 0:0:0:1: CD-ROM            Msft     Virtual DVD-ROM  1.0  PQ: 0 ANSI: 0
sd 0:0:0:0: [sda] 125829120 512-byte logical blocks: (64.4 GB/60.0 GiB)
sd 0:0:0:0: [sda] 4096-byte physical blocks
sr 0:0:0:1: [sr0] scsi3-mmc drive: 0x/0x caddy
cdrom: Uniform CD-ROM driver Revision: 3.20
sd 0:0:0:0: Attached scsi generic sg0 type 0
sr 0:0:0:1: Attached scsi generic sg1 type 5
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
tun: Universal TUN/TAP device driver, 1.6
tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
e1000: Copyright (c) 1999-2006 Intel Corporation.
e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function Network Driver - version 2.12.1-k
ixgbevf: Copyright (c) 2009 - 2012 Intel Corporation.
[...]

A couple of options:

  • ask user to start VM with a single CPU, this may discount concurrency/locking issue
  • ask a user to run a kernel with lockdep enabled
  • look for upstream hyper-v patches which may be related to this (though we have the bug fixes which got added to 4.4.22 already present).

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 20 (17 by maintainers)

Most upvoted comments

I gave the ISO to a user with some simple instructions and it worked for him and he’s happy 😃 I will update the other sufferers on docker/for-win#54 after Beta29 ships.

Still need to ping MSFT about the issue in 4.4/