origin: inode usage metrics seem incorrect
cadvisor seems to expose metrics related to inode usage in containers (cc @stevekuznetsov) but from a look it seems that some of these metrics are incorrect for some containers.
For example, container_fs_inodes_free == 0 returns containers that have actually pretty low inode usage, eg.
container_fs_inodes_free{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_instance_type="n1-standard-2",beta_kubernetes_io_os="linux",container_name="POD",device="/dev/sda1",failure_domain_beta_kubernetes_io_region="us-central1",failure_domain_beta_kubernetes_io_zone="us-central1-a",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod9abca0ce_c014_11e7_86d0_42010a800002.slice/docker-de9b645a537f1702cb32222dd1428a4af3c23a03f197032be2494cdd41b3b912.scope",image="openshift/origin-pod:v3.7.0-rc.0",instance="origin-ci-ig-m-11v4",job="kubernetes-cadvisor",kubernetes_io_hostname="origin-ci-ig-m-11v4",name="k8s_POD_registry-console-1-w927b_default_9abca0ce-c014-11e7-86d0-42010a800002_1",namespace="default",pod_name="registry-console-1-w927b",role="infra",subrole="master"}
$ oc exec -it registry-console-1-w927b -n default -- df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
overlay 78641672 173610 78468062 1% /
tmpfs 936810 18 936792 1% /dev
tmpfs 936810 16 936794 1% /sys/fs/cgroup
/dev/sda1 78641672 173610 78468062 1% /etc/hosts
shm 936810 1 936809 1% /dev/shm
tmpfs 936810 11 936799 1% /run/secrets/kubernetes.io/serviceaccount
This means we cannot reliably build alerts on top of these metrics. For example, we just realized one of our Jenkins masters is out of inodes for some days now.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 39 (36 by maintainers)
Have a reproducible case that I can perturb with kubelet changes.
Ask at forum-testplatform in Slack, this may not be an issue anymore.
On Wed, Nov 28, 2018, 19:01 Robert Krawitz <notifications@github.com wrote: