moby: Problem with linux kernel 4.8


BUG REPORT INFORMATION

Description When I launch bash on a docker with an image debian:wheezy and linux-kernel 4.8, it fails. All is ok with linux-kernel 4.7.

docker run -it debian:wheezy bash
vagrant@debian-testing:~$ echo $?
139

Steps to reproduce the issue:

  1. vagrant up # install a debian testing with linux 4.8 - I've upload Vagrantfile.txt and bootstrap.sh.txt for setup vagrant box
  2. vagrant ssh # entering vagrant box with running kernel 4.7 a. docker run -it debian:wheezy bash # all is ok actually linux kernel 4.7 b. sudo reboot
  3. vagrant ssh # entering vagrant box with running kernel 4.8 a. docker run -it debian:wheezy bash b. echo $? 139

Additional information you deem important (e.g. issue happens only occasionally): bootstrap.sh.txt Vagrantfile.txt

Output of docker version:

Client:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 21:45:16 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 21:45:16 2016
 OS/Arch:      linux/amd64

Output of docker info:

ontainers: 2
 Running: 0
 Paused: 0
 Stopped: 2
Images: 1
Server Version: 1.12.3
Storage Driver: devicemapper
 Pool Name: docker-8:1-262977-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 352.4 MB
 Data Space Total: 107.4 GB
 Data Space Available: 7.905 GB
 Metadata Space Used: 860.2 kB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.147 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.133 (2016-08-15)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: overlay host null bridge
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.8.0-1-amd64
Operating System: Debian GNU/Linux stretch/sid
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 492.3 MiB
Name: debian-testing
ID: BVRA:DBPA:GCAW:Z6LO:BIEE:I64Q:DFSB:53O2:VQFA:OVCH:CZOB:T2PX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.): vagrant with fujimakishouten/debian-stretch64 image

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 15 (10 by maintainers)

Commits related to this issue

Most upvoted comments

Something similar was reported to Debian in Debian #845085 which also points to a forum post and https://github.com/tianon/docker-brew-debian/issues/55 (/cc @tianon).

Comparing my local /boot/config-4.7.0-1-amd64 and /boot/config-4.8.0-1-amd64 (I’m still running 4.7, haven’t had a chance to reboot yet) the most interesting thing I see is:

 # CONFIG_LEGACY_VSYSCALL_NATIVE is not set
-CONFIG_LEGACY_VSYSCALL_EMULATE=y
-# CONFIG_LEGACY_VSYSCALL_NONE is not set
+# CONFIG_LEGACY_VSYSCALL_EMULATE is not set
+CONFIG_LEGACY_VSYSCALL_NONE=y

Those are described in linux/arch/x86/Kconfig

In particular:

This setting can be changed at boot time via the kernel command line parameter vsyscall=[native|emulate|none].

So it would be worth trying booting with each of vsyscall=emulate anhd vsyscall=native (in two independent tests).

CONFIG_LEGACY_VSYSCALL_NATIVE should be considered a dangerous setting: it provides an ASLR-bypassing target with usable ROP gadgets.

CONFIG_LEGACY_VSYSCALL_NONE is the safest, but it sounds like you have to deal with pre-2.13 glibcs. In that case, the remaining option is fine:

CONFIG_LEGACY_VSYSCALL_EMULATED contains some risk for ASLR-bypassing, even just for having a known-good place to read a known-value from memory.

I would strongly recommend that CONFIG_LEGACY_VSYSCALL_NONE be used and to boot systems that require emulation with “vsyscall=emulate”

@ijc25 I need a way to react to a GitHub comment with more than one heart – thanks so much for chasing this down and dropping info about it in all the places I’ve seen it reported before I was even awake! 😄 ❤️ ❤️

IMO, trying to convince Ben to delay this change until stretch+1 is just delaying the inevitable – I think our efforts would probably be better spent documenting this change and how to override the behavior. 😅

Here’s how I fixed Alpine. I hope this helps anyone else struggling with this issue.

Edit /boot/grub/grub.cfg. Add vsyscall=emulate at the end of the first menuentry. Then reboot.

Example:

set timeout=2
insmod all_video
menuentry "Alpine Linux" {
        linux /boot/vmlinuz-hardened ...vsyscall=emulate
        ...                                  👆👆👆👆
}

Note that on GRUB2 the grub.cfg file is meant to be generated automatically by the update-grub scripts so your changes will be overwritten if/when these run.

Instead, you should edit /etc/default/grub and add the option vsyscall=emulate to the end of GRUB_CMDLINE_LINUX_DEFAULT. It should look something like this:

GRUB_CMDLINE_LINUX_DEFAULT="quiet vsyscall=emulate"

The run sudo update-grub and reboot your computer.

Should we add a warning to the check-config script?

AIUI (mainly based on the Kconfig help) it’s a security “related” thing because the old setting involves some non-ASLR code in every process address space (vsyscall used to be at a fixed address), so disabling it improves things by getting rid of that.

Older (e)glibc (<=2.13 according to the Debian kernel changelog) is not compatible since it doesn’t know about the new dynamic vsyscall address mechanisms and only knows the static one. Looks like CentOS 6 and Debian Wheezy both have old enough libc to be affected.

Since Wheezy is now oldstable I suppose that was deemed a reasonable cut off point, especially since there is a command line escape hatch. I wasn’t involved/paying attention when this change was made though, so I don’t know what the probability of deferring the change for another Debian release would be.