origin: inode usage metrics seem incorrect

cadvisor seems to expose metrics related to inode usage in containers (cc @stevekuznetsov) but from a look it seems that some of these metrics are incorrect for some containers.

For example, container_fs_inodes_free == 0 returns containers that have actually pretty low inode usage, eg.

container_fs_inodes_free{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_instance_type="n1-standard-2",beta_kubernetes_io_os="linux",container_name="POD",device="/dev/sda1",failure_domain_beta_kubernetes_io_region="us-central1",failure_domain_beta_kubernetes_io_zone="us-central1-a",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod9abca0ce_c014_11e7_86d0_42010a800002.slice/docker-de9b645a537f1702cb32222dd1428a4af3c23a03f197032be2494cdd41b3b912.scope",image="openshift/origin-pod:v3.7.0-rc.0",instance="origin-ci-ig-m-11v4",job="kubernetes-cadvisor",kubernetes_io_hostname="origin-ci-ig-m-11v4",name="k8s_POD_registry-console-1-w927b_default_9abca0ce-c014-11e7-86d0-42010a800002_1",namespace="default",pod_name="registry-console-1-w927b",role="infra",subrole="master"}

$ oc exec -it registry-console-1-w927b -n default -- df -i
Filesystem       Inodes  IUsed    IFree IUse% Mounted on
overlay        78641672 173610 78468062    1% /
tmpfs            936810     18   936792    1% /dev
tmpfs            936810     16   936794    1% /sys/fs/cgroup
/dev/sda1      78641672 173610 78468062    1% /etc/hosts
shm              936810      1   936809    1% /dev/shm
tmpfs            936810     11   936799    1% /run/secrets/kubernetes.io/serviceaccount

This means we cannot reliably build alerts on top of these metrics. For example, we just realized one of our Jenkins masters is out of inodes for some days now.

@openshift/sig-pod

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 39 (36 by maintainers)

Most upvoted comments

Have a reproducible case that I can perturb with kubelet changes.

RobertKrawitz on Dec 7, 2018

Ask at forum-testplatform in Slack, this may not be an issue anymore.

On Wed, Nov 28, 2018, 19:01 Robert Krawitz <notifications@github.com wrote:

@kargakis https://github.com/kargakis is there a reproducible case (or a running container I can look at)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openshift/origin/issues/17732#issuecomment-442544820, or mute the thread https://github.com/notifications/unsubscribe-auth/ADuFfz0iaw9Xmp1X5eEDR_HJg7cQiS_Nks5uzs9UgaJpZM4Q_OxB .

0xmichalis on Nov 28, 2018