moby: devicemapper: When docker service run a long time, could not restart docker service.
Description
We use devicemapper
as graph driver and run stable test as below:
- create/delete containers continuously.
- restart docker service randomly.
After one night, the docker could not startup.
The error is
messages:2017-06-07T10:41:43.932549+00:00 V2R1C00B052-GUESTOS-FS-KVM-X64 docker: time="2017-06-07T10:41:43.747631527Z" level=debug msg="devmapper: Error device setupBaseImage: devmapper: Base Device UUID and Filesystem verification failed: devicemapper: Can't set cookie dm_task_set_cookie failed"
messages:2017-06-07T10:41:43.932765+00:00 V2R1C00B052-GUESTOS-FS-KVM-X64 docker: time="2017-06-07T10:41:43.747827726Z" level=fatal msg="Error starting daemon: error initializing graphdriver: devmapper: Base Device UUID and Filesystem verification failed: devicemapper: Can't set cookie dm_task_set_cookie failed"
And more, I tried to use dmsetup remove
to remove some device:
V2R1C00B052-GUESTOS-FS-KVM-X64:~ # dmsetup remove docker-8:2-402190-9623ed2972fa4f700eec99e1404959a5b7e64eac65d3ff541b22f6271c2ee38a
Limit for the maximum number of semaphores reached. You can check and set the limits in /proc/sys/kernel/sem.
Command failed
V2R1C00B052-GUESTOS-FS-KVM-X64:~ #
So I use ipcs
to check the ipcs:
V2R1C00B052-GUESTOS-FS-KVM-X64:~ # ipcs
------ Message Queues --------
key msqid owner perms used-bytes messages
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
------ Semaphore Arrays --------
key semid owner perms nsems
0x0d4d3358 238977024 root 600 1
0x0d4d0ec9 270172161 root 600 1
0x0d4dc02e 281640962 root 600 1
0x0d4db8d2 291045379 root 600 1
0x0d4d4e76 291864580 root 600 1
0x0d4d825a 292388869 root 600 1
0x0d4d93ee 294256646 root 600 1
0x0d4da4a1 294879239 root 600 1
0x0d4d4125 295305224 root 600 1
.......
--> 128 cookie leaks, not list here.
......
And use dmsetup udevcookies
to see the same as ipcs. cat /proc/sys/kernel/sem
V2R1C00B052-GUESTOS-FS-KVM-X64:~ # cat /proc/sys/kernel/sem
250 32000 32 128
V2R1C00B052-GUESTOS-FS-KVM-X64:~ #
It is 128
, so I echo an larger number of sem, it works:
echo 250 32000 32 1024 > /proc/sys/kernel/sem
And then Docker could startup.
So I supposed that there are semaphore leaks
in DM. But I am not sure how does it happen…
And BTW, I could use dmsetup udevcomplete_all
to cleanup all the leaks to recover the environment.
But I think we should work out an solution against this situation.
Steps to reproduce the issue:
- create/delete containers continuously.
- kill docker randomly.
After some time, use dmsetup udevcookies
to check if there is semaphore leak
exists.
On other environment, We found leaks too. but very small (less than 10).
Describe the results you received:
semaphore leak
was found and reached the limit number, docker could startup.
Describe the results you expected:
No semaphore leak
, or cleanup it at docker startup.
docker could works fine.
Additional information you deem important (e.g. issue happens only occasionally):
Output of docker version
:
root@localhost:~/workspace/huawei/docker# docker version
Client:
Version: 1.11.2
API version: 1.23
Go version: go1.7.1
Git commit: ff25c8a
Built: Thu Jun 8 15:37:09 2017
OS/Arch: linux/amd64
Server:
Version: 1.11.2
API version: 1.23
Go version: go1.7.1
Git commit: 3515a27-unsupported
Built: Thu Jun 8 16:28:37 2017
OS/Arch: linux/amd64
Output of docker info
:
root@localhost:~/workspace/huawei/docker# docker info
Containers: 210
Running: 0
Paused: 0
Stopped: 210
Images: 11
Server Version: 1.11.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Hugetlb Pagesize: 2MB
Plugins:
Volume: local
Network: bridge null host
Kernel Version: 4.6.0
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67 GiB
Name: localhost
ID: LB2C:RVJO:DK5F:GVNI:QFYC:DTII:C3UB:6QHS:754W:LE3G:BKPP:EUIY
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): true
File Descriptors: 12
Goroutines: 25
System Time: 2017-06-09T01:43:24.994823746+08:00
EventsListeners: 0
Registry: https://index.docker.io/v1/
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.):
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 12
- Comments: 40 (20 by maintainers)
We see it in 17.06.0-ce as well.
It looks like the backported fix for the semaphore leak issue just missed the cutoff for 17.06.1 and was moved to 17.06.2, based on this pull request.
@thaJeztah @vieux @cpuguy83 would it be possible to get a release of 17.06.2 or some other kind of patch with the pull request so this issue can finally be put to bed? This is a real pain to deal with on a daily basis and it looks like its starting to affect more and more people, based on the duplicate issues that keep getting created.
Having the same problem on centos 7.3.1611. Really annoying. Can’t start new containers anymore. always have to run that command sudo
echo 'y' | sudo dmsetup udevcomplete_all
to get them up. So a solution would help us greatly as well.Anyone, who can’t upgrade docker now, just increase a semaphores limit to postpone your problem to a close future 😀
e.g. like this
printf '250\t32000\t32\t200' >/proc/sys/kernel/sem
Docker:
Issue:
Same here, RHEL 7.3, docker v17.06.0-ce
Almost identical setup as above, docker v17.06.0-ce and on RHEL 7.2, I’d like to add that this problem has occurred recently when using docker stack and repeatedly starting and stopping stacks. In fact, the problem occurs on all my manager and worker nodes.
Hi,
I encountered the same problem on docker-ce 17.06/centos 7.3 after setting up devicemapper on a separate disk. After some containers it starts appearing. I managed to restart some containers after using
echo 'y' | sudo dmsetup udevcomplete_all
. But it now failed each time I want to start a new container (sad for a gitlab runner!).Is there a long time work around or should I switch to another FS? The docker doc explain that docekr-ce on CentOS should use DM: https://docs.docker.com/engine/userguide/storagedriver/selectadriver/#docker-ce
Thanks for your help