kubernetes: Inconsistent docker root causing fluentd failures

It seems that there are potentially two separate issues here:

  • The inconsistent docker root due to a potential race between kubelet and docker.
  • The misconfiguration (on AWS) of fluentd to mount /var/lib/docker instead of /mnt/ephemeral/docker.

For the first issue, perhaps the kubelet should fail if it is not able to retrieve the docker root from docker info?

An alternative solution would be to mount our volume for AWS to /var/lib/docker instead of /mnt/ephemeral/docker. This would solve both issues. But, it seems that the kubelet is providing the ability to dynamically locate the docker root for some reason.

Details:

I’m having intermittent issues with fluentd pushing logs to elasticsearch. It seems that fluentd is unable to see the log files, even though it is able list the containers directory. This is happening on an AWS cluster.

From inside the fluentd I see the following:

 root@fluentd-elasticsearch-ip-172-20-0-194:/# cat /varlog/containers/vapor-web-v1-x894w_vapor-alpha_nginx-521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491.log
 cat: /varlog/containers/vapor-web-v1-x894w_vapor-alpha_nginx-521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491.log: No such file or directory

On AWS the log file symlinks look like this:

 root@fluentd-elasticsearch-ip-172-20-0-194:/# ls -al /varlog/containers/vapor-web-v1-x894w_*
 lrwxrwxrwx 1 root root 171 Aug 27 18:03 /varlog/containers/vapor-web-v1-x894w_vapor-alpha_POD-99cd13b12f2b45275f5a3bcb7537d08ce722c48d66e3e21b57ad0563d5f2d8c0.log -> /mnt/ephemeral/docker/containers/99cd13b12f2b45275f5a3bcb7537d08ce722c48d66e3e21b57ad0563d5f2d8c0/99cd13b12f2b45275f5a3bcb7537d08ce722c48d66e3e21b57ad0563d5f2d8c0-json.log
 lrwxrwxrwx 1 root root 171 Aug 27 18:03 /varlog/containers/vapor-web-v1-x894w_vapor-alpha_nginx-521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491.log -> /mnt/ephemeral/docker/containers/521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491/521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491-json.log
 lrwxrwxrwx 1 root root 171 Aug 27 18:03 /varlog/containers/vapor-web-v1-x894w_vapor-alpha_vapor-unicorn-1e5233b3d6d2e8c92c8534dd61d46b4aadc7766d54423eb17425682d678bad9a.log -> /mnt/ephemeral/docker/containers/1e5233b3d6d2e8c92c8534dd61d46b4aadc7766d54423eb17425682d678bad9a/1e5233b3d6d2e8c92c8534dd61d46b4aadc7766d54423eb17425682d678bad9a-json.log

Fluentd seems to expect that the symlinks will point to /var/lib/docker/containers but on AWS we use /mnt/ephemeral/docker/containers.

From the node I see:

root@ip-172-20-0-194:/home/ubuntu# ls -al /var/log/containers/vapor-web-v1-x894w_vapor-alpha_*
lrwxrwxrwx 1 root root 171 Aug 27 18:03 /var/log/containers/vapor-web-v1-x894w_vapor-alpha_nginx-521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491.log -> /mnt/ephemeral/docker/containers/521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491/521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491-json.log
lrwxrwxrwx 1 root root 171 Aug 27 18:03 /var/log/containers/vapor-web-v1-x894w_vapor-alpha_POD-99cd13b12f2b45275f5a3bcb7537d08ce722c48d66e3e21b57ad0563d5f2d8c0.log -> /mnt/ephemeral/docker/containers/99cd13b12f2b45275f5a3bcb7537d08ce722c48d66e3e21b57ad0563d5f2d8c0/99cd13b12f2b45275f5a3bcb7537d08ce722c48d66e3e21b57ad0563d5f2d8c0-json.log
lrwxrwxrwx 1 root root 171 Aug 27 18:03 /var/log/containers/vapor-web-v1-x894w_vapor-alpha_vapor-unicorn-1e5233b3d6d2e8c92c8534dd61d46b4aadc7766d54423eb17425682d678bad9a.log -> /mnt/ephemeral/docker/containers/1e5233b3d6d2e8c92c8534dd61d46b4aadc7766d54423eb17425682d678bad9a/1e5233b3d6d2e8c92c8534dd61d46b4aadc7766d54423eb17425682d678bad9a-json.log
 root@ip-172-20-0-194:/home/ubuntu# cat /mnt/ephemeral/docker/containers/521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491/521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491-json.log | tail -n 5
 {"log":"172.20.0.70 - - [28/Aug/2015:17:29:26 +0000] \"GET / HTTP/1.1\" 302 116 \"-\" \"Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)\"\n","stream":"stdout","time":"2015-08-28T17:29:26.242591558Z"}
 {"log":"172.20.0.70 - - [28/Aug/2015:17:29:26 +0000] \"GET /users/sign_in HTTP/1.1\" 200 1954 \"-\" \"Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)\"\n","stream":"stdout","time":"2015-08-28T17:29:26.265429648Z"}
 {"log":"2015/08/28 17:29:26 [info] 5#5: *58423 client 172.20.0.70 closed keepalive connection\n","stream":"stdout","time":"2015-08-28T17:29:26.271201715Z"}
 {"log":"2015/08/28 17:29:38 [info] 5#5: *58426 client closed connection while waiting for request, client: 172.20.0.171, server: 0.0.0.0:80\n","stream":"stdout","time":"2015-08-28T17:29:38.549273837Z"}
 {"log":"2015/08/28 17:29:46 [info] 5#5: *58427 client closed connection while waiting for request, client: 172.20.0.33, server: 0.0.0.0:80\n","stream":"stdout","time":"2015-08-28T17:29:46.549263618Z"}

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Reactions: 5
  • Comments: 33 (22 by maintainers)

Most upvoted comments

@so0k

Thanks for the info. I did not find the yamls at /etc/kubernetes/manifests/.... In the end we just created a new cluster with KUBE_ENABLE_NODE_LOGGING=false and created our own Fluentd yamls as a daemon set, and added a volume mount for /mnt/ephemeral/docker ourselves.

This issue is really nasty, I hope a fix will arrive soon

The inconsistency seems to be tied to nodes that defaulted to /var/lib/docker as their docker root.

ubuntu@ip-172-20-0-171:~$ ls -al /var/log/containers/ | tail -n 5
lrwxrwxrwx  1 root root     165 Aug 21 21:37 worker-cleaner-rbpw0_vapor-alpha_worker-cleaner-72ec7b56758d5f35d9deda061a010f0c3c46af1c19247e132d1cb1ec456c8023.log -> /var/lib/docker/containers/72ec7b56758d5f35d9deda061a010f0c3c46af1c19247e132d1cb1ec456c8023/72ec7b56758d5f35d9deda061a010f0c3c46af1c19247e132d1cb1ec456c8023-json.log
lrwxrwxrwx  1 root root     165 Aug 21 21:48 worker-cleaner-sjvbk_vapor-alpha_POD-a8f48c3fee3dfaf89db00900435b4803d7709d84fa565503039c6553825eeb2d.log -> /var/lib/docker/containers/a8f48c3fee3dfaf89db00900435b4803d7709d84fa565503039c6553825eeb2d/a8f48c3fee3dfaf89db00900435b4803d7709d84fa565503039c6553825eeb2d-json.log
lrwxrwxrwx  1 root root     165 Aug 21 21:48 worker-cleaner-sjvbk_vapor-alpha_worker-cleaner-6208af9f07a5642db5bbc5804dce9aa0172211cb2b1c4c5d535730dc0ef1c15f.log -> /var/lib/docker/containers/6208af9f07a5642db5bbc5804dce9aa0172211cb2b1c4c5d535730dc0ef1c15f/6208af9f07a5642db5bbc5804dce9aa0172211cb2b1c4c5d535730dc0ef1c15f-json.log

Interestingly, on that node docker info still reports docker root as /mnt/ephemeral/docker

ubuntu@ip-172-20-0-171:~$ sudo docker info
Containers: 61
Images: 178
Storage Driver: aufs
 Root Dir: /mnt/ephemeral/docker/aufs
 Backing Filesystem: extfs
 Dirs: 300
 Dirperm1 Supported: true
Execution Driver: native-0.2
Kernel Version: 3.19.0-20-generic
Operating System: Ubuntu 15.04 (containerized)
CPUs: 8
Total Memory: 31.42 GiB
Name: ip-172-20-0-171
ID: 5ARV:N526:VXCL:K7IC:V4SW:LI4X:P6CO:6LEU:DSTN:SF47:HKGF:E5EN
WARNING: No swap limit support

update:

Restarting the kubelet on the above box caused it to pick up the correct path for dockerRoot. So, it seems it’s a possible race between docker and kubelet?