rancher: Rancher-secrets created 2500 Docker volumes, which causes lengthy delays when starting Docker
I rebooted a Rancher node. The system came up fine.
Docker tried to start, but doesn’t seem to start completely. It seems to be waiting for the rancher-secrets plugin.
I’m including logs for Docker immediately after a reboot. The logs show that systemd begins starting Docker at 17:55:32. This is followed by numerous messages mentioning “rancher-secrets”. Docker is finally finished starting at 19:16, 81 minutes later. I rebooted twice, and this 81 minute delay happened both times. I provided proof of this 81 minute delay below.
[root@docker6 ~]# journalctl -b -u docker -u systemd |less
-- Logs begin at Mon 2017-06-05 17:54:23 PDT, end at Tue 2017-06-06 11:35:32 PDT. --
Jun 05 17:55:32 docker6.example.org systemd[1]: Starting Docker Application Container Engine...
Jun 05 17:55:32 docker6.example.org docker[1268]: Command "daemon" is deprecated, and will be removed in Docker 1.16. Please run `dockerd` directly.
Jun 05 17:55:34 docker6.example.org docker[1268]: time="2017-06-05T17:55:34.770197274-07:00" level=info msg="libcontainerd: new containerd process, pid: 2589"
Jun 05 17:55:37 docker6.example.org docker[1268]: time="2017-06-05T17:55:37.092945996-07:00" level=warning msg="devmapper: Base device already exists and has filesystem xfs on it. User specified filesystem will be ignored."
Jun 05 17:55:40 docker6.example.org docker[1268]: time="2017-06-05T17:55:40.737732433-07:00" level=warning msg="Unable to locate plugin: rancher-secrets, retrying in 1s"
Jun 05 17:55:41 docker6.example.org docker[1268]: time="2017-06-05T17:55:41.738065606-07:00" level=warning msg="Unable to locate plugin: rancher-secrets, retrying in 2s"
Jun 05 17:55:43 docker6.example.org docker[1268]: time="2017-06-05T17:55:43.738373190-07:00" level=warning msg="Unable to locate plugin: rancher-secrets, retrying in 4s"
Jun 05 17:55:47 docker6.example.org docker[1268]: time="2017-06-05T17:55:47.738705385-07:00" level=warning msg="Unable to locate plugin: rancher-secrets, retrying in 8s"
Jun 05 17:55:55 docker6.example.org docker[1268]: time="2017-06-05T17:55:55.739135017-07:00" level=warning msg="Error restoring volume" driver=rancher-secrets error="error while checking if volume \"AAAABBBBCCCCDDDD\" exists in driver \"rancher-secrets\": Error looking up volume plugin rancher-secrets: legacy plugin: plugin not found" volume=AAAABBBBCCCCDDDD
Jun 05 17:55:55 docker6.example.org docker[1268]: time="2017-06-05T17:55:55.739325867-07:00" level=warning msg="Unable to locate plugin: rancher-secrets, retrying in 1s"
Jun 05 17:55:56 docker6.example.org docker[1268]: time="2017-06-05T17:55:56.739636974-07:00" level=warning msg="Unable to locate plugin: rancher-secrets, retrying in 2s"
Jun 05 17:55:58 docker6.example.org docker[1268]: time="2017-06-05T17:55:58.739982055-07:00" level=warning msg="Unable to locate plugin: rancher-secrets, retrying in 4s"
Jun 05 17:56:02 docker6.example.org docker[1268]: time="2017-06-05T17:56:02.740296222-07:00" level=warning msg="Unable to locate plugin: rancher-secrets, retrying in 8s"
Jun 05 17:56:10 docker6.example.org docker[1268]: time="2017-06-05T17:56:10.740660119-07:00" level=warning msg="Error restoring volume" driver=rancher-secrets error="error while checking if volume \"EEEEFFFFGGGGGHHHHH\" exists in driver \"rancher-secrets\": Error looking up volume plugin rancher-secrets: legacy plugin: plugin not found" volume=EEEEFFFFGGGGGHHHHH
And then, 81 minutes later and numerous errors about rancher-secrets, Docker starts, and eventually Rancher-Agent starts and everything starts behaving normally again.
Jun 05 19:15:55 docker6.example.org docker[1268]: time="2017-06-05T19:15:55.742382318-07:00" level=warning msg="Unable to locate plugin: rancher-secrets, retrying in 1s"
Jun 05 19:15:56 docker6.example.org docker[1268]: time="2017-06-05T19:15:56.742713812-07:00" level=warning msg="Unable to locate plugin: rancher-secrets, retrying in 2s"
Jun 05 19:15:58 docker6.example.org docker[1268]: time="2017-06-05T19:15:58.743052281-07:00" level=warning msg="Unable to locate plugin: rancher-secrets, retrying in 4s"
Jun 05 19:16:02 docker6.example.org docker[1268]: time="2017-06-05T19:16:02.743395731-07:00" level=warning msg="Unable to locate plugin: rancher-secrets, retrying in 8s"
Jun 05 19:16:10 docker6.example.org docker[1268]: time="2017-06-05T19:16:10.743667834-07:00" level=warning msg="Error restoring volume" driver=rancher-secrets error="error while checking if volume \"IIIIJJJJKKKKLLLL\" exists in driver \"rancher-secrets\": Error looking up volume plugin rancher-secrets: legacy plugin: plugin not found" volume=IIIIJJJJKKKKLLLL
Jun 05 19:16:10 docker6.example.org docker[1268]: time="2017-06-05T19:16:10.769703597-07:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Jun 05 19:16:10 docker6.example.org docker[1268]: time="2017-06-05T19:16:10.782673367-07:00" level=info msg="Loading containers: start."
# NOTE: There are 322 messages like this:
Jun 05 19:16:10 docker6.example.org docker[1268]: time="2017-06-05T19:16:10.850554743-07:00" level=error msg="Failed to load container MMMMNNNNOOOOPPPP: open /var/lib/docker/containers/MMMMNNNNOOOOPPPP/config.v2.json: no such file or directory"
Jun 05 19:16:10 docker6.example.org docker[1268]: time="2017-06-05T19:16:10.852911996-07:00" level=error msg="Failed to load container QQQQRRRRSSSSTTTTUUUU: open /var/lib/docker/containers/QQQQRRRRSSSSTTTTUUUU/config.v2.json: no such file or directory"
... SNIP ....
# And finally Docker starts
Jun 05 19:16:10 docker6.example.org docker[1268]: time="2017-06-05T19:16:10.905994030-07:00" level=info msg="Firewalld running: false"
Jun 05 19:16:11 docker6.example.org docker[1268]: time="2017-06-05T19:16:11.157291757-07:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Jun 05 19:16:11 docker6.example.org docker[1268]: time="2017-06-05T19:16:11.783323629-07:00" level=info msg="Loading containers: done."
Jun 05 19:16:11 docker6.example.org docker[1268]: time="2017-06-05T19:16:11.849633466-07:00" level=info msg="Daemon has completed initialization"
Jun 05 19:16:11 docker6.example.org docker[1268]: time="2017-06-05T19:16:11.849694399-07:00" level=info msg="Docker daemon" commit=c6d412e graphdriver=devicemapper version=17.03.1-ce
Jun 05 19:16:11 docker6.example.org systemd[1]: Started Docker Application Container Engine.
Jun 05 19:16:11 docker6.example.org docker[1268]: time="2017-06-05T19:16:11.861151166-07:00" level=info msg="API listen on /var/run/docker.sock"
Jun 05 19:16:13 docker6.example.org docker[1268]: time="2017-06-05T19:16:13.143427172-07:00" level=error msg="Handler for GET /v1.22/containers/rancher-agent-upgrade/json returned error: No such container: rancher-agent-upgrade"
Jun 05 19:16:13 docker6.example.org docker[1268]: time="2017-06-05T19:16:13.144064048-07:00" level=error msg="Handler for GET /v1.22/images/rancher-agent-upgrade/json returned error: No such image: rancher-agent-upgrade"
After this, Rancher-agent starts and everything works well.
According to the system logs, 81 minutes elapsed between the “Linux version” message and when Docker starts loading containers.
[root@docker6 ~]# egrep 'Linux version|msg="Loading containers: start."' /var/log/messages
# 15:54 to 17:15 is 81 minutes
Jun 5 15:54:38 docker6 kernel: Linux version 3.10.0-514.10.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 3 00:04:05 UTC 2017
Jun 5 17:15:07 docker6 docker: time="2017-06-05T17:15:07.465969414-07:00" level=info msg="Loading containers: start."
# 17:54 to 19:16 is 81 minutes
Jun 5 17:54:23 docker6 kernel: Linux version 3.10.0-514.10.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 3 00:04:05 UTC 2017
Jun 5 19:16:10 docker6 docker: time="2017-06-05T19:16:10.782673367-07:00" level=info msg="Loading containers: start."
| Useful | Info |
|---|---|
| Versions | Rancher v1.5.9 Cattle: v0.178.12 UI: v1.5.9 |
| Access | openldap admin |
| Orchestration | Cattle |
| Route | hosts.index |
OS: CentOS 7.3 Docker version: Docker 17.03.1-ce (docker-engine-17.03.1.ce-1.el7.centos.x86_64) Docker storage driver: devicemapper with direct-lvm
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 7
- Comments: 27 (4 by maintainers)
priority this please!! this not only for secrets volume, this problem for all docker plugins
worked with @stefanlasiewski and @cjellick on this. we noticed the docker volumes metadata.db was ~2mb. Which is large for a system with a handful of volumes. We backed up and removed the metadata file then restarted docker. Docker restarted fairly quickly and rebuilt its volumes metadata db. Ensure you have proper data backups/persistence in place in the case that docker doesnt recover. This could potentially void warranties and or cause excommunication from your place of worship. You’ve been warned. The long term goal here for rancher-secrets is to move away from docker volume drivers. Flapping containers can cause orphaned volumes, and with the right constellation alignment docker wont clean up after itself.
@Here is some more information. It looks like metadata.db may have stale data.
metadata.db was larger than expected:
There are no rancher-secrets volumes on this node:
However despite the fact that there are no actually rancher-secrets volumes, searching metadata.db shows many occurrences of Rancher secrets.
To reset the Metadata file, we did
mv /var/lib/docker/volumes/metadata.db ~/tmp/metadata.db.bak && systemctl restart docker. Docker re-populates metadata.db and the file is now a reasonable size, at 32K:And Docker now restarts in < 1 minute: ``` [root@docker05 ]# time systemctl restart docker
real 0m45.688s user 0m0.017s sys 0m0.019s [root@docker05 ]#
@stefanlasiewski It’s well-known issue and happens with all rancher’s storage plugins. For example, related issue #8433.