kops: Kubelet 'failed to get cgroup stats for "/system.slice/kubelet.service"' error messages
-
What
kopsversion are you running? The commandkops version, will display this information.Version 1.8.0 (git-5099bc5) -
What Kubernetes version are you running?
kubectl versionwill print the version if a cluster is running or provide the Kubernetes version specified as akopsflag.
Client Version: version.Info{Major:"1", Minor:"7+", GitVersion:"v1.7.9-dirty", GitCommit:"7f63532e4ff4fbc7cacd96f6a95b50a49a2dc41b", GitTreeState:"dirty", BuildDate:"2017-10-26T22:33:15Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:17:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
-
What cloud provider are you using? AWS
-
What commands did you run? What is the simplest way to reproduce this issue?
Provisioned a 1.8.4 cluster by:
- Creating a basic cluster with
kops - Exporting that cluster to a manifest file
- Added my changes to manfiest
- Generate terraform configs
- What happened after the commands executed? I get the following error message in the daemon logs unless I add the following to the manifest:
kubelet:
kubeletCgroups: "/systemd/system.slice"
runtimeCgroups: "/systemd/system.slice"
masterKubelet:
kubeletCgroups: "/systemd/system.slice"
runtimeCgroups: "/systemd/system.slice"
Dec 12 22:12:14 ip-172-20-64-61 kubelet[30742]: E1212 22:12:14.322073 30742 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
- What did you expect to happen? No error messages
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 6
- Comments: 32 (13 by maintainers)
Links to this issue
Commits related to this issue
- Adding additional fix for https://github.com/kubernetes/kops/issues/4049 — committed to mmerrill3/kops by deleted user 6 years ago
- Revert "Adding additional fix for https://github.com/kubernetes/kops/issues/4049" This reverts commit 877a90f9306e5671ba7da85270a77220ca6b216b. — committed to mmerrill3/kops by deleted user 6 years ago
See https://github.com/kontena/pharos-cluster/issues/440#issuecomment-399014418 for why the
--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.sliceworkaround is a bad idea on CentOS: the extra/systemdprefix causes the kubelet and dockerd processes to escape from their correct systemd cgroups into a new/systemd/system.slicecgroup created next to the real/system.slicecgroup.I’m not entirely sure how much the systemd cgroup names differ across OSes, but I assume that the workaround was actually meant to be
--runtime-cgroups=/system.slice --kubelet-cgroups=/system.slice… that’s only slightly better https://github.com/kontena/pharos-cluster/issues/440#issuecomment-399017863: the processes still escape from the systemdkubelet.service/docker.servicecgropus, and the kubelet/stats/summaryAPI still reports the wrong numbers.The correct fix is to enable systemd
CPUAccountingandMemoryAccountingfor thekubelet.service… this causes systemd to create missing the/system.slice/*.servicecgroups for all services, and matches what happens by default on e.g. Ubuntu xenial. This allows the kubelet/stats/summaryAPI to report the correctsystemContainermetrics for theruntimeandkubelet: https://github.com/kontena/pharos-cluster/issues/440#issuecomment-399022473I think these systemd settings should be shipped as part of the upstream
kubeletpackage’skubelet.service, if the kubelet assumes that systemd creates those cgroups?/etc/systemd/system/kubelet.service.d/11-cgroups.conf@mercantiandrea @oliverseal Seems you’re doing manually what’s done automatically for you, i.e. https://github.com/kubernetes/kops/issues/4049#issuecomment-352152838. You only need to add these when you edit your cluster:
this workaround works also on AWS default image for kops (k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02 (ami-bd229ec4))
sudo vim /etc/sysconfig/kubeletadd at the end of DAEMON_ARGS string:
--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slicefinally:
sudo systemctl restart kubeletBut I think that this problem should be fixed in the kops release
hi all not fixed in 1.11 😦 , just setup release 1.11 and still have the bug , the config file has changed it’s now a yaml file , where we can add the workaround thanks
We should fix this in 1.11
I think his solution of adding:
as a default seems sane, this slice does exist and may help provide some metrics (I’m not sure where they’re exposed, or what we’re missing without them yet)
@itskingori the point is this should be the default settings instead of us going in the config and making changes since we know default values are wrong. One by one things like this add up and you end up with a long list of custom settings you need to apply every time you create a cluster.
We can close this but I wouldn’t call it a duplicate of #3762 unless you change the title. That issue is a recommendation to follow best practices, this is an error message. I’m trying to prepare new 1.8 clusters for critical production workloads. Can someone clarify if:
kubeletCgroupsandruntimeCgroupsshould be used until #3762 is addressed.@oliverseal You can modify the user-data bash script in the terraform template. It is encoded in base64 but you can easily decode, modify and finally encode again.
Added this to the userdata after “download-release”:
and it seems to be working fine.