kubernetes: Inconsistent docker root causing fluentd failures
It seems that there are potentially two separate issues here:
- The inconsistent docker root due to a potential race between kubelet and docker.
- The misconfiguration (on AWS) of fluentd to mount /var/lib/docker instead of /mnt/ephemeral/docker.
For the first issue, perhaps the kubelet should fail if it is not able to retrieve the docker root from docker info?
An alternative solution would be to mount our volume for AWS to /var/lib/docker instead of /mnt/ephemeral/docker. This would solve both issues. But, it seems that the kubelet is providing the ability to dynamically locate the docker root for some reason.
Details:
I’m having intermittent issues with fluentd pushing logs to elasticsearch. It seems that fluentd is unable to see the log files, even though it is able list the containers directory. This is happening on an AWS cluster.
From inside the fluentd I see the following:
root@fluentd-elasticsearch-ip-172-20-0-194:/# cat /varlog/containers/vapor-web-v1-x894w_vapor-alpha_nginx-521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491.log
cat: /varlog/containers/vapor-web-v1-x894w_vapor-alpha_nginx-521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491.log: No such file or directory
On AWS the log file symlinks look like this:
root@fluentd-elasticsearch-ip-172-20-0-194:/# ls -al /varlog/containers/vapor-web-v1-x894w_*
lrwxrwxrwx 1 root root 171 Aug 27 18:03 /varlog/containers/vapor-web-v1-x894w_vapor-alpha_POD-99cd13b12f2b45275f5a3bcb7537d08ce722c48d66e3e21b57ad0563d5f2d8c0.log -> /mnt/ephemeral/docker/containers/99cd13b12f2b45275f5a3bcb7537d08ce722c48d66e3e21b57ad0563d5f2d8c0/99cd13b12f2b45275f5a3bcb7537d08ce722c48d66e3e21b57ad0563d5f2d8c0-json.log
lrwxrwxrwx 1 root root 171 Aug 27 18:03 /varlog/containers/vapor-web-v1-x894w_vapor-alpha_nginx-521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491.log -> /mnt/ephemeral/docker/containers/521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491/521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491-json.log
lrwxrwxrwx 1 root root 171 Aug 27 18:03 /varlog/containers/vapor-web-v1-x894w_vapor-alpha_vapor-unicorn-1e5233b3d6d2e8c92c8534dd61d46b4aadc7766d54423eb17425682d678bad9a.log -> /mnt/ephemeral/docker/containers/1e5233b3d6d2e8c92c8534dd61d46b4aadc7766d54423eb17425682d678bad9a/1e5233b3d6d2e8c92c8534dd61d46b4aadc7766d54423eb17425682d678bad9a-json.log
Fluentd seems to expect that the symlinks will point to /var/lib/docker/containers but on AWS we use /mnt/ephemeral/docker/containers.
From the node I see:
root@ip-172-20-0-194:/home/ubuntu# ls -al /var/log/containers/vapor-web-v1-x894w_vapor-alpha_*
lrwxrwxrwx 1 root root 171 Aug 27 18:03 /var/log/containers/vapor-web-v1-x894w_vapor-alpha_nginx-521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491.log -> /mnt/ephemeral/docker/containers/521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491/521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491-json.log
lrwxrwxrwx 1 root root 171 Aug 27 18:03 /var/log/containers/vapor-web-v1-x894w_vapor-alpha_POD-99cd13b12f2b45275f5a3bcb7537d08ce722c48d66e3e21b57ad0563d5f2d8c0.log -> /mnt/ephemeral/docker/containers/99cd13b12f2b45275f5a3bcb7537d08ce722c48d66e3e21b57ad0563d5f2d8c0/99cd13b12f2b45275f5a3bcb7537d08ce722c48d66e3e21b57ad0563d5f2d8c0-json.log
lrwxrwxrwx 1 root root 171 Aug 27 18:03 /var/log/containers/vapor-web-v1-x894w_vapor-alpha_vapor-unicorn-1e5233b3d6d2e8c92c8534dd61d46b4aadc7766d54423eb17425682d678bad9a.log -> /mnt/ephemeral/docker/containers/1e5233b3d6d2e8c92c8534dd61d46b4aadc7766d54423eb17425682d678bad9a/1e5233b3d6d2e8c92c8534dd61d46b4aadc7766d54423eb17425682d678bad9a-json.log
root@ip-172-20-0-194:/home/ubuntu# cat /mnt/ephemeral/docker/containers/521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491/521e74369e30312a32806b018e62ef81d6a606dd87c0772aaacc2461ae84e491-json.log | tail -n 5
{"log":"172.20.0.70 - - [28/Aug/2015:17:29:26 +0000] \"GET / HTTP/1.1\" 302 116 \"-\" \"Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)\"\n","stream":"stdout","time":"2015-08-28T17:29:26.242591558Z"}
{"log":"172.20.0.70 - - [28/Aug/2015:17:29:26 +0000] \"GET /users/sign_in HTTP/1.1\" 200 1954 \"-\" \"Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)\"\n","stream":"stdout","time":"2015-08-28T17:29:26.265429648Z"}
{"log":"2015/08/28 17:29:26 [info] 5#5: *58423 client 172.20.0.70 closed keepalive connection\n","stream":"stdout","time":"2015-08-28T17:29:26.271201715Z"}
{"log":"2015/08/28 17:29:38 [info] 5#5: *58426 client closed connection while waiting for request, client: 172.20.0.171, server: 0.0.0.0:80\n","stream":"stdout","time":"2015-08-28T17:29:38.549273837Z"}
{"log":"2015/08/28 17:29:46 [info] 5#5: *58427 client closed connection while waiting for request, client: 172.20.0.33, server: 0.0.0.0:80\n","stream":"stdout","time":"2015-08-28T17:29:46.549263618Z"}
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Reactions: 5
- Comments: 33 (22 by maintainers)
@so0k
Thanks for the info. I did not find the yamls at
/etc/kubernetes/manifests/.... In the end we just created a new cluster withKUBE_ENABLE_NODE_LOGGING=falseand created our own Fluentd yamls as a daemon set, and added a volume mount for/mnt/ephemeral/dockerourselves.This issue is really nasty, I hope a fix will arrive soon
The inconsistency seems to be tied to nodes that defaulted to /var/lib/docker as their docker root.
Interestingly, on that node docker info still reports docker root as /mnt/ephemeral/docker
update:
Restarting the kubelet on the above box caused it to pick up the correct path for
dockerRoot. So, it seems it’s a possible race between docker and kubelet?