kubernetes: [BUG] Windows disk usage getting reported larger than disk size and may prevent GC from running

What happened?

Value reported by crictl.exe imagefsinfo is roughly around ~530 GiB.

{
  "status": {
    "timestamp": "1677171468699023800",
    "fsId": {
      "mountpoint": "C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows"
    },
    "usedBytes": {
      "value": "570024692671"
    },
    "inodesUsed": {
      "value": "0"
    }
  }
}

The volume however is only 511 GB with 91 GB free:

Get-Volume -DriveLetter C

DriveLetter FriendlyName FileSystemType DriveType HealthStatus OperationalStatus SizeRemaining      Size
----------- ------------ -------------- --------- ------------ ----------------- -------------      ----
C           Windows      NTFS           Fixed     Healthy      OK                     90.75 GB 511.51 GB

When checking the snapshotter directory via PowerShell it gives me ~509 GiB which also seems off:

"{0} Bytes" -f ((Get-ChildItem C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots -Recurse | Measure-Object -Property Length -Sum -ErrorAction Stop).Sum)
546676366719 Bytes

ImageFsStats may calculates the disk usage wrongly on Windows (may because of symlinks?).

It looks like the wrong disk usage calculation prevents the GC from running and deleting unused images. Two of my Windows nodes failed with disks being full although DiskPressure was reported as false. The kubelet logs didn’t show any GC logs.

What did you expect to happen?

The disk usage of Windows nodes should be calculated correctly. DiskPressure=true should be reported when it’s the case. The GC should start deleting unused images when imageGCHighThresholdPercent is reached.

How can we reproduce it (as minimally and precisely as possible)?

  • Setup a cluster with a Windows node
  • Start many pods on the Windows node with different images to fill up the disk
  • Run crictl.exe imagefsinfo and compare it to Get-Volume -DriveLetter C

Anything else we need to know?

I set the GC settings in my kubelet config:

"imageMinimumGCAge": "2m0s",
"imageGCHighThresholdPercent": 85,
"imageGCLowThresholdPercent": 80, 

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"8f94681cd294aa8cfd3407b8191f6c70214973a4", GitTreeState:"clean", BuildDate:"2023-01-18T15:58:16Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"windows/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"b969368e201e1f09440892d03007c62e791091f8", GitTreeState:"clean", BuildDate:"2022-12-16T19:44:08Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Azure AKS

OS version

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
Microsoft Windows Server 2022 Datacenter, 10.0.20348, 20348, 64-bit

Install tools

Azure AKS Windows Node image version: AKSWindows-2022-containerd-20348.1487.230111

Container runtime (CRI) and version (if applicable)

RuntimeName: containerd RuntimeVersion: v1.6.14+azure RuntimeApiVersion: v1alpha2

Related plugins (CNI, CSI, …) and versions (if applicable)

-

/sig windows

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 26 (7 by maintainers)

Most upvoted comments

Disclaimer: Sorry about the wall of text 😄

An update on this. I’ve been side tracked a bit, but will investigate more in depth as time permits. In the meantime, some more details on why I think there is a difference between what Get-Volume shows and what powershell and other tools show in terms of usage/free space.

The short version is: It has to do with the way windows-layer and WCIFS (Windows Container Isolation File System) are implemented.

The longer version:

The context

A windows-layer can have 3 possible states:

  • Base layers. These layers have no parents and include a number of metadata files and of course de OS libraries and binaries required for containers to work
  • Read-Only non base layers. These are intermediary layers that hold diffs from the base layer all the way to the scratch layer
  • Scratch layers. These are writable layers that usually get attached to containers. These layers hold only a sandbox.vhdx, and any changes done by the container is stored in this file.

The base layer

This layer holds the initial OS files needed by a container and optionally a UtilityVM folder which is used when creating a Hyper-V isolated container. Without going into too much detail, Hyper-V isolated containers work similarly to Kata containers. A VM is started and the actual container is then started inside that VM.

This layer is never mounted directly. Before a container can make use of it, a new scratch layer is created with the base layer as a parent. The scratch layer is essentially a writable overlay which stores it’s changes inside that sandbox.vhdx.

Read-only non base layers

These layers are derived from a base layer, and imported whenever we do a ctr i pull of an image. Imported layers don’t hold a sandbox.vhdx and cannot be exported again. Also, these layers cannot be used directly. A new scratch layer needs to be created from a read-only layer (and it’s parents).

To generate a new read-only layer, we use the ctr snapshot commit command on an active scratch layer. This will export the scratch layer and import it back on top of itself (at least in containerd). To do this, we leverage the export layer functionality in HCS, which requires that we give it the entire layer chain up to the scratch layer, in descending order as well as the scratch layer.

The two step process goes something like this:

  1. Export will create a tar.gz containing any newly added files, changed files, and metadata about files and folders we delete (toombstones.txt).
  2. Import (and here is where the magic happens). Will detail the import bellow

Importing a windows-layer

When importing a windows-layer, we need to provide HCS with the entire parent layer chain that was used when we exported it and an empty folder that will become the newly imported layer. HCS will then proceed to take each layer in that chain, one after another, and create reparse points that will point to files in parent “read-only” layers. Note, the base layer files are skipped.

Of course, if the next layer removes that file, the reparse point is also removed. This process continues, until we reach the last layer in the chain. Once all parent layers are processed, and reparse points are created, the contents of the tar.gz are applied over the new layer. This means files/links get added, the metadata is read and any files that were deleted are also deleted from the layer.

We now have a new imported layer that holds (potentially) many reparse points to files in other layers, and maybe a few new files, unique to our layer. The reparse points that belong to other layers will appear as Offline if we inspect their attributes. This is because they don’t really exist on the host. More on that bellow.

WCIFS and putting it all together

The Windows Container Isolation File System acts as an overlay. It takes all the layers in the chain, and combines them into one \\?\Volume{}. The reparse points that pointed nowhere on the host, are magically “hidrated” and can now be accessed. That’s because the references stored in the reparse point, now exist within the context of that virtual volume. We can view them if we mount them on the host via:

wclayer.exe mount -l parent1 -l parent2 scratch_layer C:\test

The C:\test folder will hold the files in question.

The mystery of the extra disk space

When we use either Windows explorer or powershell to get the total disk used within a folder, the Length attribute of each file, reparse points included, will be inspected and added to the total. The reparse points in our case, don’t actually take up any disk space, but they do have a length attribute which mirrors the original file.

So the du.exe command above might actually give a better estimate of actual usage, as it seems to silently ignore not being able to os.Stat() those files to get their attributes.

The more “read-only” layers you have, the more pronounced the discrepancy between what Get-Volume shows and what

(Get-ChildItem -recurse . | Measure-Object -Property Length -sum).Sum

returns.

How to reproduce

Get a list of layers:

ctr snapshot list

Pick any committed snapshot and create a scratch layer:

ctr snapshot prepare new_layer_key parent_layer_id_shown_above

Mount the new layer:

wclayer.exe mount -l parent1 -l parent2 new_layer_path C:\test

Change something small, like add a new file, then unmount:

wclayer unmount new_layer_path C:\test

and commit:

ctr snapshot commit new_layer_key

Now, if you go to the actual layer on the host and run:

(Get-ChildItem -recurse . | Measure-Object -Property Length -sum).Sum

You will notice that even though you may have added a 1 KB file, a lot more disk space seems to be used. This is because all files added in previous layers have a reparse point present in this new one. But the actual usage of this layer is the “1 KB” file, plus the layered hive files you see in the Hives folder. These should be small though.

Final thoughts

I still need to investigate more, but it seems that crictl.exe imagefsinfo adds the size of the reparse points to the total (not sure, though). If it does, we’ll probably need to change that to get an accurate metric of used space.

Will ping back when I find out more.

The docs also mention Set-MpPreference -EnableControlledFolderAccess AuditMode, that might prevent the problem, or narrow it down to the event-watching locks being leaked.

I also managed to reproduce this by starting procmon setting up a filter for the parent layer and simply watching for events. Defender was disabled. So I am not sure what the actual mechanism is here, but more than one thing seems to trigger this, for this particular workload.

I am almost convinced that a process container that runs exclusively userspace services will not exhibit this.

Another update.

This does not seem to happen when Hyper-V isolation is used. Which leads me to think that defender may not be the root cause of the issue, but merely a trigger.

The following is just a hypothesis. As I am not really familiar with windows internals at that level, so this is just a guess:

The workload running inside the container we’re using to reproduce this is a web app running inside IIS. From what I remember (it’s been a while since I’ve looked at it), IIS is an in-kernel web server implementation.

The files we see open from that parent layer that seems to get locked, are held open by SYSTEM process (pid 4). As the container starts and the app loads it’s files, we see more and more files from a parent layer being open. All of them by SYSTEM. If it were WCIFS or HCS doing this, then in all likelihood this would have happened when the container booted, not as the app loaded it’s libraries.

Windows server containers share the same kernel as the host. It is possible that as the app loads it’s files, some interactions of defender, may open a handle or access the file in some way when as it’s loaded by IIS/svchost/pid 4.

When the container is killed, the kernel (obviously) remains, because it’s shared with the host. But I think it still holds an open handle to those files for some reason. Sadly, I am not that familiar with how IIS works or with windows internals at this level, but given the fact that this doesn’t repro with Hyper-V isolation enabled, it stands to reason that this may be the root cause.

When using hyper-v isolation, a new partition is created, a VM is spun up with it’s own kernel and a container is started inside of it. We still see the files being open on the host, but this time they are not open by SYSTEM (pid 4), they are open by a vmmem process with it’s own PID.

When we kill the container, that process exits, the kernel running inside the VM exits and the files are released. On the host, the parent layers are released every time.

It is possible that this behavior would not manifest if the workloads inside the container run exclusively in userspace. Using a HTTP server implementation (or any other kind of workload) that does not run in-kernel will probably not trigger this, even without hyper-v isolation.

To test this, I updated the containerd.exe binary to version 1.7.0 and the containerd-shim-runhcs-v1.exe (hcsshim) binary to the latest rc tag.

I updated 'C:\Program Files\containerd\config.toml to:

version = 2
[metrics]
  address = "0.0.0.0:10257"
  grpc_histogram = false

[plugins]
  [plugins."io.containerd.grpc.v1.cri".registry]
      [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
          endpoint = ["https://registry-1.docker.io"]

  [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "c:/k/azurecni/bin"
      conf_dir = "c:/k/azurecni/netconf"
      conf_template = ""
      ip_pref = ""
      max_conf_num = 1
      setup_serially = false

  [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "mcr.microsoft.com/oss/kubernetes/pause:3.9-windows-ltsc2022-amd64"

  [plugins."io.containerd.grpc.v1.cri".containerd]
    default_runtime_name = "runhcs-wcow-process"
    disable_snapshot_annotations = false
    discard_unpacked_layers = true
    ignore_blockio_not_enabled_errors = false
    ignore_rdt_not_enabled_errors = false
    no_pivot = false
    snapshotter = "windows"

    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runhcs-wcow-hypervisor]
        runtime_type = "io.containerd.runhcs.v1"
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runhcs-wcow-hypervisor.options]
        Debug = true
        DebugType = 2
        SandboxPlatform = "windows/amd64"
        SandboxIsolation = 1
        VmProcessorCount = 4
        VmMemorySizeInMb = 4096
        ScaleCpuLimitsToSandbox = true
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runhcs-wcow-process]
        runtime_type = "io.containerd.runhcs.v1"
        pod_annotations = ["microsoft.com/*", "io.microsoft.*" ]
        container_annotations = ["microsoft.com/*", "io.microsoft.*","io.microsoft.container.*" ]
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runhcs-wcow-process.options]

I created a new runtime class in k8s:

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: runhcs-wcow-hypervisor
handler: runhcs-wcow-hypervisor

And used:

spec:
  template:
    spec:
      runtimeClassName: "runhcs-wcow-hypervisor"

In the workload yaml file.

Will keep investigating and add more details as they become available.

I narrowed it down to just:

Set-MpPreference -EnableControlledFolderAccess Disabled

Will update this bug as more info becomes available.

@gabriel-samfira thanks a lot for the write-up, very interesting!

More important than the size differences of different tools is to me that the disk really gets full. kubelet correctly identifies that but fails to garbage collect images (see https://github.com/kubernetes/kubernetes/issues/116020#issuecomment-1443244191).

It seems like containerd isn’t fully deleting snapshots or keeping wrong records about doing so, like I described in https://github.com/kubernetes/kubernetes/issues/116020#issuecomment-1443575342.

Do you have an idea about that?

Hi @lippertmarkus

Apologies for the delay. I am still investigating, but a few observations so far:

  • du.exe is not the best tool to compute disk usage on Windows. It may show less disk space being used than what is actually being used.
  • Using du.exe:
Administrator@WIN-I9LH13G2BA5 MINGW64 /c/ProgramData/containerd/root/io.containerd.snapshotter.v1.windows/snapshots
$ du -s
374011  .
  • Using powershell:
PS C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots> (Get-ChildItem -recurse . | Measure-Object -Property Length -sum).Sum
566812050
  • The output from imagefsinfo
PS C:\ProgramData\containerd\root\io.containerd.snapshotter.v1.windows\snapshots> crictl.exe --runtime-endpoint=npipe://./pipe/containerd-containerd imagefsinfo
{
  "status": {
    "timestamp": "1680088399627656400",
    "fsId": {
      "mountpoint": "C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows"
    },
    "usedBytes": {
      "value": "568556946"
    },
    "inodesUsed": {
      "value": "0"
    }
  }
  • It seems that containerd keeps separate records and images/snapshots when we call it through the CRI API than when we call it through containerd api which is used by ctr.exe.
    • Doing a ctr.exe i pull mcr.microsoft.com/windows/nanoserver:ltsc2022 will produce a new snapshot and will show up in ctr.exe i ls. All layers of that image will show up in ctr.exe shapshot ls.
    • Doing a crictl.exe pull mcr.microsoft.com/windows/nanoserver:ltsc2022 will create a new snapshot.
  • The imagefsinfo command will only show us the size of the images downloaded via the containerd CRI api (through crictl.exe). This value is most likely cached in metadata.db, not fetched directly from the disk.

I am spinning up a k8s cluster and reproducing this. Sorry for the delay.

Digged a bit deeper by increasing the disk usage and looking at the logs. GC did indeed garbage collect:

GC logs:

I0224 06:35:44.167360    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=85 highThreshold=85 amountToFree=22240534528 lowThreshold=80
I0224 06:35:44.170894    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:c733789646e295f3f35d7bde9657b09d92069fc4791f6886b81ce6034cdda4f3" size=131814973
I0224 06:35:44.806093    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:4d45d35f934a62503a1df703d4f7862d61854cc42d7e1b0dba9249f009cdaf01" size=305776038
I0224 06:35:45.612127    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:255ec55f9730903a13be47d51cd83a8c36549d7e853ad92298269814bd516241" size=125554391
I0224 06:35:46.316468    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:76139fdf1eb94661570fb051f9af5da56d94c04a5dc93b61ecdb69a790bbefab" size=2304257979
I0224 06:35:46.459250    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:17d5a7e95bb6150ece279ddfd1f484a4a8bf54a1eb7f201fe3c49236da84fda9" size=124471446
I0224 06:35:47.369494    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:e424d3c0688cc28f85b5719146c458ba1ea24fa503ec7e0c2e33f265977da619" size=143165548
I0224 06:35:47.542260    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:4fec32b83947e57fe1de16d7349d808c689074efac0b78f133f913bea09aeca5" size=145143068
I0224 06:35:47.713870    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:b1a822dabd8ec1ff252d23143dad5a4010f8c61466f9e582d01b38d2c6e63416" size=2304049573
I0224 06:36:05.968580    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:2ed810d5eaeae447afcb5a44eb14730301ae2a1b8fdd36553c81a8981cdfdc4c" size=3313455437
I0224 06:37:31.106760    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:6688afdc4a65f3464539257c037d598275a1eda6452c9d788b7780bdc4815d58" size=144019860
I0224 06:37:31.555290    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:b8c32952da321842aeb551b514f46a3876780f37a25f58b65bca92c56ef69b85" size=132078452
I0224 06:37:32.361156    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:c868a912caad9a64e6601bafa5188aa9d5d981b0c63f58988d90c88160c19688" size=307861950
I0224 06:37:32.409214    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:20912e85e6b23bce85e855c29b9c5fa5325cb1418e969e5a26d61c000f2446ca" size=129524857
I0224 06:37:32.608067    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:8d9453088438abfd7f1b8b82f1124612054d99d545bff726bd0637da051a7c36" size=130148808
I0224 06:37:33.358821    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:81bef0c14fa8b438155b49ab3b2a2e64afcd808e51dca58ce2b77868dfa45de1" size=131397094
I0224 06:37:33.471945    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:99878e6b578e6d3a8da19a1a76b9ec03ca8b77d6806a4ef2692d0707d4ad3121" size=6151711469
I0224 06:37:33.526163    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:1304d2e712a2dbf014781d0d4fd165b5ea32ec4d5cc01853a61ba949fc79a50d" size=6841294551


I0224 07:07:33.602762    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=88 highThreshold=85 amountToFree=40323702784 lowThreshold=80
I0224 07:07:33.606399    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:e9ea6f9b45a7095fcd516950ab9c6f0c3fd87efb0bf4d78176121488cb4b7fbb" size=7202173663
I0224 07:07:49.872039    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:c868a912caad9a64e6601bafa5188aa9d5d981b0c63f58988d90c88160c19688" size=307861950
I0224 07:07:51.170721    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:db6d8bc5fdd373eab943fd94793a8ddb4d349b3ef46cf980935c2b1f9ef0d3bf" size=7250867456
I0224 07:07:58.980988    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:380fe901429ff35d5aa7f3116d12203a911b2829e830a7f33a846f3d208868e5" size=8871439082
I0224 07:07:59.071515    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:b544a8555f72799e51810e190d1528278b6f1d50cdc92bb378dcde1060f50c48" size=8738840368
I0224 07:07:59.135503    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:4125948559aa736ce01c2def970196ab3072719ff2c7d9c0c1db1e0bb31ad017" size=8406200538


I0224 07:22:59.217995    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=86 highThreshold=85 amountToFree=31321923584 lowThreshold=80
I0224 07:22:59.220999    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:c868a912caad9a64e6601bafa5188aa9d5d981b0c63f58988d90c88160c19688" size=307861950
I0224 07:22:59.221590    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:6c6fd10061e429fa0cea7ed19524aba4e215410dbd9e0954fa3418625ec4820b" size=8702966529
I0224 07:23:00.471140    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:24ddf0a8f21849c4860922273d1f5417a94aaa4c83e37655d454360251e13ca5" size=7439816576
I0224 07:23:00.499352    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:71171ab5148294ac26691bf4e03e92f0a769c4fda8bab6dffcf9d9057e1e2e37" size=5746633975
I0224 07:23:17.130188    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:b415ba1007c388632a1a67f71798045904463602ab1fbd69c717c857ffca5390" size=7464054534
I0224 07:23:17.769043    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:372f0cae7bf8e88c7f593cc5f73837f3333991a7fb7ab0b58416dff515c12106" size=7014711109


I0224 07:28:17.863082    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=88 highThreshold=85 amountToFree=38620483584 lowThreshold=80
I0224 07:28:17.875584    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:cdfba36b1f78762722254dfaf240c53354ec811059069560917e2af1ff0e1a1d" size=7518385339
I0224 07:28:18.223947    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:c4cd71ba2465a323e79ea1869604bf9e4e20089fe92472bba37da3ddeb6d3241" size=7189084549
I0224 07:28:18.285726    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:3b624688dca5431ed24613a085a14d80348b9bbddedda0f2de3597dd7cd8b8ce" size=6761401498
I0224 07:28:18.378493    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:3f2a14c8b8895f3a7cdd02b0c7c323215b159c366a99e41a7544bb164f1c4bac" size=7003736326
I0224 07:28:18.424651    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:ba4252cb8231cafb91e782599bd8a5d9e48af77da99824be82228f860143a61f" size=6144143054
I0224 07:28:18.481864    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:b9732e2dd8c4564135dba3d22657be6707abdb13388a6f39eed07d8772182d0d" size=7214352222


I0224 07:33:18.531774    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=90 highThreshold=85 amountToFree=50942779392 lowThreshold=80
I0224 07:33:18.537016    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:c868a912caad9a64e6601bafa5188aa9d5d981b0c63f58988d90c88160c19688" size=307861950
I0224 07:33:18.537559    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:ee03d096f476a8e8a8915ab00046bb1f99731d688700d65d9c376f077c86950b" size=7146822022
I0224 07:33:18.567015    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:0d9287d379fb76857515a2b517594a39dc8f9116f6d6cfc4984c1833b6dd0f95" size=7034544052
I0224 07:33:18.601595    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:83d4de6fdeb73ceda3a0dd4b809629de4e3d1053b9a3c5a099faf6980e8aa15d" size=6842299067
I0224 07:33:18.628278    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:88a2774936c98483e74df3207609718cf1a099f34ffca4cc7d3454919776d79e" size=132304241
I0224 07:33:18.654802    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:d720e99d9a76e33af24c7d75c59b0b11795415fef0edf4c88b6ca55dc506417e" size=7216929192
I0224 07:33:18.684380    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:97fb9816090587d5a2d62436c99ac89a8daa83f94b307c17f4cc6560c73a8012" size=7196437748
I0224 07:33:18.713804    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:738072b1026ad67d216849ab4bb3dd9ec97530c10f18fffcd240f7a05ef34226" size=6082033429
I0224 07:33:18.740786    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:18d0079fddf5b4589aa940ca56f7b6f564c5276605a2ceb88dbaa5f16b57c9fc" size=7319951596
I0224 07:33:18.769647    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:4c9a647618fc6542dafa0740bac74398a79be37ba2a2fba04d6eb4fda85c6431" size=7277244176


I0224 07:38:18.805913    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=90 highThreshold=85 amountToFree=50950483968 lowThreshold=80
I0224 07:38:18.808480    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:63533019a89a5ea3ec57b566118be96d05d63ee457bcfc8f9b091bb24331c34d" size=7169240581
I0224 07:38:18.836363    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:9195ec53469c25b3fc600da532ce097f6d80e9851f2f4f45b5319df22e8182d3" size=8734135456
I0224 07:38:18.855395    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:19148c3833ef4d7ee93926fb3640d6e5696acb37a2c541e1ad5e52b32c089cd2" size=7040980951
I0224 07:38:18.882261    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:acc13e0dbad561304cfd5038b8e3c05b19d7c478983ea07899ec96ed41617987" size=5992519426
I0224 07:38:18.908210    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:193cced89f15fb10a37621f6ec59770fd70af1fc2905e308678bf7cd808c2214" size=9171653968
I0224 07:38:18.932864    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:75babacea201c2d3579eeea066bd5713ba78ae486bf8e533fce66c1fb8d8d9a4" size=7735550532
I0224 07:38:18.962440    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:e1240c2ea24f71fbe7281ed21669ed288c472cdb30f10fc1666b3a7787d52af8" size=7010213492


I0224 07:43:18.991398    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=90 highThreshold=85 amountToFree=50962112512 lowThreshold=80
I0224 07:43:18.994116    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:c868a912caad9a64e6601bafa5188aa9d5d981b0c63f58988d90c88160c19688" size=307861950
I0224 07:43:18.994289    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:c072462b4cb5ae3ff9867e6d8474d985645301888604293b0f696a4cf04f829f" size=7045738141
I0224 07:43:19.021398    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:8369c87bf8eee92e02f7d12bbf78328da91687f1f6f111523bf0e2deee468a5b" size=6223720689
I0224 07:43:19.046361    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:b1079af32540448f9016e7aa8cd5355add91638b11b67e24fdba0622048dea44" size=7363460412
I0224 07:43:19.069606    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:d2070cd010d87196f85a6153eccc9ef452c24b744fd462479410f19a7ffb3e61" size=7739381206
I0224 07:43:19.097767    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:90200afc9b622599928b7b3c77e87f60014fae8ab176b79a48c75a7a264fc5e1" size=7778223167
I0224 07:43:19.119736    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:ebbd09a888d0e9061ee1a07cf11cc09b5b4ecd45a77d3d751ed1da671f3c7773" size=7829883764
I0224 07:43:19.144123    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:064d3afbfbd62952ce51745aa5ae0596e196022f21242c6b5e7072fefc821ec7" size=7665129961

I0224 07:48:19.170434    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=90 highThreshold=85 amountToFree=50968887296 lowThreshold=80
I0224 07:48:19.173651    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:4b0e9b02dfe51d4989979c8d4ddea923d61afe5f434c7247e4c3794f497ef00d" size=6380408118
I0224 07:48:19.203660    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:b860ef7c531d70f29ea1aaa4a787124606e12b7de4cc9a3e7730cd8ed5388978" size=6833220135
I0224 07:48:19.227677    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:2a8068712bb0d9628ddffb46bcbcc7de4b47fbe73c655e925ccf9383769423e5" size=5981136111
I0224 07:48:19.251071    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:a3ab1a01e99e016f5faa1cc546fd13c15896eaadcb1dabf34b72ab5a6b3c05cd" size=7478611045
I0224 07:48:19.275740    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:d89ba8197bbcb8205a29474e4814fcfdc95b752a901a62e540da7246bf2e132f" size=7699797392
I0224 07:48:19.299213    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:4d83c48bac77caa9ad50ce5d6aaa4a8a4ea7f1ccc7a378948fcb22ede08a952f" size=5493853245
I0224 07:48:19.324334    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:8675019203f2a76aff1ee027fa1f4edd133881c6ca33a55cfaa903c10e371a23" size=7250408886
I0224 07:48:19.348466    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:e8fb06c0ea951531353ce6a9e5b741ec38cee2c0d785837a71eeb252a1186bf3" size=7665260217

I0224 07:53:19.380203    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=90 highThreshold=85 amountToFree=50980552704 lowThreshold=80
I0224 07:53:19.384050    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:c868a912caad9a64e6601bafa5188aa9d5d981b0c63f58988d90c88160c19688" size=307861950
I0224 07:53:19.386001    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:c9e4e87bf782d4bbf720a8b35e5a4ec4a260afba02ae94c1e2edc48ea4dab9e8" size=7769425065
I0224 07:53:19.414804    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:1e5f38c0906a005b3b8a6507489de19577d01d7480f81b03166cf4d097a7e505" size=6371265205
I0224 07:53:19.438711    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:ba7ab6513577bec191cec38f0d601830a23c1eb9c40dad5cc5ae574b7eefa3b5" size=7343975244
I0224 07:53:19.463874    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:c42d18a4baccb5da1f60872e097d934cc25510be080622648d8eec83e7dcbe11" size=5558179045
I0224 07:53:19.487280    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:6a756f3c548cd7a04b0b8144ac16152e225a1643689e7add62fed34a5ecc3f48" size=7784561764
I0224 07:53:19.509423    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:327967d91baa6e47f3ddf763f38d36f235b9f0432d74ed235a510f666696f7dd" size=7990804521
I0224 07:53:19.539068    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:3fe6342a3da9076bbbbf9e163c1e03ee8833f443de45107a7714bb36b5709731" size=6546276232
I0224 07:53:19.561524    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:da9d80c760fdc5ebe653f82d214877c813444e7c25894a66db39e8d4ae09c8b6" size=7823935685


I0224 07:58:19.592718    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=90 highThreshold=85 amountToFree=50991325184 lowThreshold=80
I0224 07:58:19.595383    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:f112941c5021601f101acf3cfa9ce562c92c57a97c616dfbf3cde6de929687dd" size=7801344488
E0224 07:58:19.620203    5896 kubelet.go:1317] "Image garbage collection failed once. Stats initialization may not have completed yet" err="failed to garbage collect required amount of images. Wanted to free 50991325184 bytes, but freed 7801344488 bytes"
I0224 08:03:19.632431    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=90 highThreshold=85 amountToFree=50999279616 lowThreshold=80
I0224 08:03:19.635213    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:c868a912caad9a64e6601bafa5188aa9d5d981b0c63f58988d90c88160c19688" size=307861950
E0224 08:03:19.635213    5896 kubelet.go:1313] "Image garbage collection failed multiple times in a row" err="failed to garbage collect required amount of images. Wanted to free 50999279616 bytes, but freed 307861950 bytes"
I0224 08:08:19.636601    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=90 highThreshold=85 amountToFree=51010617344 lowThreshold=80
E0224 08:08:19.639345    5896 kubelet.go:1313] "Image garbage collection failed multiple times in a row" err="failed to garbage collect required amount of images. Wanted to free 51010617344 bytes, but freed 0 bytes"
I0224 08:13:19.645076    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=90 highThreshold=85 amountToFree=51017768960 lowThreshold=80
I0224 08:13:19.647541    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:c868a912caad9a64e6601bafa5188aa9d5d981b0c63f58988d90c88160c19688" size=307861950
E0224 08:13:19.647578    5896 kubelet.go:1313] "Image garbage collection failed multiple times in a row" err="failed to garbage collect required amount of images. Wanted to free 51017768960 bytes, but freed 307861950 bytes"
I0224 08:18:19.650475    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=90 highThreshold=85 amountToFree=51029954560 lowThreshold=80
E0224 08:18:19.653901    5896 kubelet.go:1313] "Image garbage collection failed multiple times in a row" err="failed to garbage collect required amount of images. Wanted to free 51029954560 bytes, but freed 0 bytes"
I0224 08:23:19.658074    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=90 highThreshold=85 amountToFree=51037683712 lowThreshold=80
I0224 08:23:19.659811    5896 image_gc_manager.go:387] "Removing image to free bytes" imageID="sha256:c868a912caad9a64e6601bafa5188aa9d5d981b0c63f58988d90c88160c19688" size=307861950
E0224 08:23:19.660674    5896 kubelet.go:1313] "Image garbage collection failed multiple times in a row" err="failed to garbage collect required amount of images. Wanted to free 51037683712 bytes, but freed 307861950 bytes"
I0224 08:28:19.671957    5896 image_gc_manager.go:310] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=90 highThreshold=85 amountToFree=51048828928 lowThreshold=80
E0224 08:28:19.674136    5896 kubelet.go:1313] "Image garbage collection failed multiple times in a row" err="failed to garbage collect required amount of images. Wanted to free 51048828928 bytes, but freed 0 bytes"

The usage and amountToFree in the logs are correct. The logs also tell that images were removed, the disk usage however didn’t get lower, or sometimes only a little bit lower but not to the amount the log says.

Consider the following graph on the disk usage %, with the red vertical lines being GC runs according to the logs: image

Let’s look at three examples:

  1. GC at 6:35, logs (above) tell that ~21 GiB were freed, in reality (see graph) only ~1GiB was freed
  2. GC at 7:48, logs (above) tell that ~51 GiB were freed, in reality (see graph) nothing was freed
  3. Starting at 8:08 GC fails completely (because all images left are in use) with "Image garbage collection failed multiple times in a row" err="failed to garbage collect required amount of images. Wanted to free 51010617344 bytes, but freed 0 bytes"

When looking at ctr i ls the sum of image sizes is ~160 GiB (real value likely is lower because of shared image layers). The images that were said to be removed by the GC don’t appear here:

ctr i ls

REF                                                                                                                                    TYPE                                                      DIGEST                                                                  SIZE      PLATFORMS                                          LABELS                      

cosmoppi.azurecr.io/bcartifacts/cosmo-bc:onprem-19.1.31886.32186-at                                                                    application/vnd.docker.distribution.manifest.v2+json      sha256:88f8829c79bf99cab47b8d6c045e4088b529b0530060a74a5aa977a8d69bdd08 4.9 GiB   windows/amd64                                      io.cri-containerd.image=managed
cosmoppi.azurecr.io/bcartifacts/cosmo-bc:onprem-19.16.52583.0-w1                                                                       application/vnd.docker.distribution.manifest.v2+json      sha256:7f2c567005290a8bdec11cd3390fdfbb4678e4b1702a37834443ead38aa51c90 5.0 GiB   windows/amd64                                      io.cri-containerd.image=managed
cosmoppi.azurecr.io/bcartifacts/cosmo-bc:onprem-21.4.52563.52785-de                                                                    application/vnd.docker.distribution.manifest.v2+json      sha256:ae180cc2de87bf8597b545980bcd1c08746bf2e3d3336905b6801e996fbb9c33 5.2 GiB   windows/amd64                                      io.cri-containerd.image=managed
cosmoppi.azurecr.io/bcartifacts/cosmo-bc:sandbox-20.5.45456.53141-de                                                                   application/vnd.docker.distribution.manifest.v2+json      sha256:62a73f18ac0d032a29a0c631cb84a7170f7a45ec5d4536fca8d48ce0a74fe5df 6.6 GiB   windows/amd64                                      io.cri-containerd.image=managed
cosmoppi.azurecr.io/bcartifacts/cosmo-bc:sandbox-20.5.45456.53198-de                                                                   application/vnd.docker.distribution.manifest.v2+json      sha256:2f1cb496944926cae35e4ed724d4e6f01ffe3a223fd30d2ac23882ca3a732819 6.9 GiB   windows/amd64                                      io.cri-containerd.image=managed
cosmoppi.azurecr.io/bcartifacts/cosmo-bc:sandbox-20.5.45456.53410-de                                                                   application/vnd.docker.distribution.manifest.v2+json      sha256:08b1b4d24c9d060bd550e1d4d22ec2e60d8f2e7585865e060948ca0f6b9c2e58 6.9 GiB   windows/amd64                                      io.cri-containerd.image=managed
cosmoppi.azurecr.io/bcartifacts/cosmo-bc:sandbox-21.4.52563.53271-de                                                                   application/vnd.docker.distribution.manifest.v2+json      sha256:efc0b996825ee7d441e0b3698aa73b84530a83628a99edca60396a34ec7e50ff 7.2 GiB   windows/amd64                                      io.cri-containerd.image=managed
cosmoppi.azurecr.io/bcartifacts/cosmo-bc:sandbox-21.4.52563.53501-fr                                                                   application/vnd.docker.distribution.manifest.v2+json      sha256:a0e465d27c6f8a447489f3e95cd73ec13808f792e8db5262d903f90c718270e8 7.2 GiB   windows/amd64                                      io.cri-containerd.image=managed
cosmoppi.azurecr.io/bcartifacts/cosmo-bc@sha256:08b1b4d24c9d060bd550e1d4d22ec2e60d8f2e7585865e060948ca0f6b9c2e58                       application/vnd.docker.distribution.manifest.v2+json      sha256:08b1b4d24c9d060bd550e1d4d22ec2e60d8f2e7585865e060948ca0f6b9c2e58 6.9 GiB   windows/amd64                                      io.cri-containerd.image=managed 
cosmoppi.azurecr.io/bcartifacts/cosmo-bc@sha256:2f1cb496944926cae35e4ed724d4e6f01ffe3a223fd30d2ac23882ca3a732819                       application/vnd.docker.distribution.manifest.v2+json      sha256:2f1cb496944926cae35e4ed724d4e6f01ffe3a223fd30d2ac23882ca3a732819 6.9 GiB   windows/amd64                                      io.cri-containerd.image=managed
cosmoppi.azurecr.io/bcartifacts/cosmo-bc@sha256:62a73f18ac0d032a29a0c631cb84a7170f7a45ec5d4536fca8d48ce0a74fe5df                       application/vnd.docker.distribution.manifest.v2+json      sha256:62a73f18ac0d032a29a0c631cb84a7170f7a45ec5d4536fca8d48ce0a74fe5df 6.6 GiB   windows/amd64                                      io.cri-containerd.image=managed
cosmoppi.azurecr.io/bcartifacts/cosmo-bc@sha256:7f2c567005290a8bdec11cd3390fdfbb4678e4b1702a37834443ead38aa51c90                       application/vnd.docker.distribution.manifest.v2+json      sha256:7f2c567005290a8bdec11cd3390fdfbb4678e4b1702a37834443ead38aa51c90 5.0 GiB   windows/amd64                                      io.cri-containerd.image=managed
cosmoppi.azurecr.io/bcartifacts/cosmo-bc@sha256:88f8829c79bf99cab47b8d6c045e4088b529b0530060a74a5aa977a8d69bdd08                       application/vnd.docker.distribution.manifest.v2+json      sha256:88f8829c79bf99cab47b8d6c045e4088b529b0530060a74a5aa977a8d69bdd08 4.9 GiB   windows/amd64                                      io.cri-containerd.image=managed
cosmoppi.azurecr.io/bcartifacts/cosmo-bc@sha256:a0e465d27c6f8a447489f3e95cd73ec13808f792e8db5262d903f90c718270e8                       application/vnd.docker.distribution.manifest.v2+json      sha256:a0e465d27c6f8a447489f3e95cd73ec13808f792e8db5262d903f90c718270e8 7.2 GiB   windows/amd64                                      io.cri-containerd.image=managed
cosmoppi.azurecr.io/bcartifacts/cosmo-bc@sha256:ae180cc2de87bf8597b545980bcd1c08746bf2e3d3336905b6801e996fbb9c33                       application/vnd.docker.distribution.manifest.v2+json      sha256:ae180cc2de87bf8597b545980bcd1c08746bf2e3d3336905b6801e996fbb9c33 5.2 GiB   windows/amd64                                      io.cri-containerd.image=managed
cosmoppi.azurecr.io/bcartifacts/cosmo-bc@sha256:efc0b996825ee7d441e0b3698aa73b84530a83628a99edca60396a34ec7e50ff                       application/vnd.docker.distribution.manifest.v2+json      sha256:efc0b996825ee7d441e0b3698aa73b84530a83628a99edca60396a34ec7e50ff 7.2 GiB   windows/amd64                                      io.cri-containerd.image=managed
docker.io/cosmoconsult/azdevops-build-agent:latest-bcagent-k8s                                                                         application/vnd.docker.distribution.manifest.list.v2+json sha256:b4749cbcc5ce6ae95952e61802a3279db4db5d817dd77e0419ae14d992ec7774 1.7 GiB   windows/amd64                                      io.cri-containerd.image=managed
docker.io/cosmoconsult/azdevops-build-agent@sha256:b4749cbcc5ce6ae95952e61802a3279db4db5d817dd77e0419ae14d992ec7774                    application/vnd.docker.distribution.manifest.list.v2+json sha256:b4749cbcc5ce6ae95952e61802a3279db4db5d817dd77e0419ae14d992ec7774 1.7 GiB   windows/amd64                                      io.cri-containerd.image=managed
ghcr.io/prometheus-community/windows-exporter:latest                                                                                   application/vnd.docker.distribution.manifest.list.v2+json sha256:bcfd6ecd44ec004e534e419c8643fec68cc39f623b90fcb2915e7359fd67503d 122.7 MiB windows/amd64                                      io.cri-containerd.image=managed
ghcr.io/prometheus-community/windows-exporter@sha256:bcfd6ecd44ec004e534e419c8643fec68cc39f623b90fcb2915e7359fd67503d                  application/vnd.docker.distribution.manifest.list.v2+json sha256:bcfd6ecd44ec004e534e419c8643fec68cc39f623b90fcb2915e7359fd67503d 122.7 MiB windows/amd64                                      io.cri-containerd.image=managed
kubletwin/pause                                                                                                                        application/vnd.docker.distribution.manifest.list.v2+json sha256:a67d781a5a51290a56f6fb603b8ac9509abce8948d5a52ff3e02e8669a83180d 293.6 MiB linux/amd64,linux/arm64,windows/amd64              io.cri-containerd.image=managed
mcr.microsoft.com/oss/kubernetes-csi/azuredisk-csi:v1.24.0                                                                             application/vnd.docker.distribution.manifest.list.v2+json sha256:468f64b34e9598c3c37a44ba111ee30db61936e9a17dd73bda17b38ad85fb874 136.8 MiB linux/amd64,linux/arm64,windows/amd64              io.cri-containerd.image=managed 
mcr.microsoft.com/oss/kubernetes-csi/azuredisk-csi@sha256:468f64b34e9598c3c37a44ba111ee30db61936e9a17dd73bda17b38ad85fb874             application/vnd.docker.distribution.manifest.list.v2+json sha256:468f64b34e9598c3c37a44ba111ee30db61936e9a17dd73bda17b38ad85fb874 136.8 MiB linux/amd64,linux/arm64,windows/amd64              io.cri-containerd.image=managed
mcr.microsoft.com/oss/kubernetes-csi/azurefile-csi:v1.24.0                                                                             application/vnd.docker.distribution.manifest.list.v2+json sha256:a2ebb454e148776776e389cd92258bc50c7ad42a9506b29834fd1eb3cfa89cfb 125.4 MiB linux/amd64,linux/arm64,windows/amd64              io.cri-containerd.image=managed
mcr.microsoft.com/oss/kubernetes-csi/azurefile-csi@sha256:a2ebb454e148776776e389cd92258bc50c7ad42a9506b29834fd1eb3cfa89cfb             application/vnd.docker.distribution.manifest.list.v2+json sha256:a2ebb454e148776776e389cd92258bc50c7ad42a9506b29834fd1eb3cfa89cfb 125.4 MiB linux/amd64,linux/arm64,windows/amd64              io.cri-containerd.image=managed
mcr.microsoft.com/oss/kubernetes-csi/csi-node-driver-registrar:v2.5.0                                                                  application/vnd.docker.distribution.manifest.list.v2+json sha256:348b2d4eebc8da38687755a69b6c21035be232325a6bcde54e5ec4e04689fd93 120.0 MiB linux/amd64,linux/arm64,windows/amd64              io.cri-containerd.image=managed
mcr.microsoft.com/oss/kubernetes-csi/csi-node-driver-registrar@sha256:348b2d4eebc8da38687755a69b6c21035be232325a6bcde54e5ec4e04689fd93 application/vnd.docker.distribution.manifest.list.v2+json sha256:348b2d4eebc8da38687755a69b6c21035be232325a6bcde54e5ec4e04689fd93 120.0 MiB linux/amd64,linux/arm64,windows/amd64              io.cri-containerd.image=managed
mcr.microsoft.com/oss/kubernetes-csi/livenessprobe:v2.6.0                                                                              application/vnd.docker.distribution.manifest.list.v2+json sha256:e01f5dae19d7e1be536606fe5deb893417429486b628b816d80ffa0e441eeae8 119.1 MiB linux/amd64,linux/arm64,windows/amd64              io.cri-containerd.image=managed
mcr.microsoft.com/oss/kubernetes-csi/livenessprobe@sha256:e01f5dae19d7e1be536606fe5deb893417429486b628b816d80ffa0e441eeae8             application/vnd.docker.distribution.manifest.list.v2+json sha256:e01f5dae19d7e1be536606fe5deb893417429486b628b816d80ffa0e441eeae8 119.1 MiB linux/amd64,linux/arm64,windows/amd64              io.cri-containerd.image=managed
mcr.microsoft.com/oss/kubernetes/azure-cloud-node-manager:v1.25.5                                                                      application/vnd.docker.distribution.manifest.list.v2+json sha256:3dc9066f6d86fd70a801d68be9231118a27a0f19385107f8d73e0be65665f0fe 126.5 MiB linux/amd64,linux/arm/v7,linux/arm64,windows/amd64 io.cri-containerd.image=managed
mcr.microsoft.com/oss/kubernetes/azure-cloud-node-manager@sha256:3dc9066f6d86fd70a801d68be9231118a27a0f19385107f8d73e0be65665f0fe      application/vnd.docker.distribution.manifest.list.v2+json sha256:3dc9066f6d86fd70a801d68be9231118a27a0f19385107f8d73e0be65665f0fe 126.5 MiB linux/amd64,linux/arm/v7,linux/arm64,windows/amd64 io.cri-containerd.image=managed
mcr.microsoft.com/windows/nanoserver:ltsc2022                                                                                          application/vnd.docker.distribution.manifest.list.v2+json sha256:b4248836dea118c48481b821d9fbb40daf7e207b48f5e6edc9da6837050b0f65 112.8 MiB windows/amd64                                      io.cri-containerd.image=managed
mcr.microsoft.com/windows/nanoserver@sha256:b4248836dea118c48481b821d9fbb40daf7e207b48f5e6edc9da6837050b0f65                           application/vnd.docker.distribution.manifest.list.v2+json sha256:b4248836dea118c48481b821d9fbb40daf7e207b48f5e6edc9da6837050b0f65 112.8 MiB windows/amd64                                      io.cri-containerd.image=managed
mcr.microsoft.com/windows/servercore:ltsc2022                                                                                          application/vnd.docker.distribution.manifest.list.v2+json sha256:dfd3ce22e4b6e987ff2bfb3efe5e4912512fce35660be2ae5faa91e6f4da9748 1.3 GiB   windows/amd64                                      io.cri-containerd.image=managed
mcr.microsoft.com/windows/servercore@sha256:dfd3ce22e4b6e987ff2bfb3efe5e4912512fce35660be2ae5faa91e6f4da9748                           application/vnd.docker.distribution.manifest.list.v2+json sha256:dfd3ce22e4b6e987ff2bfb3efe5e4912512fce35660be2ae5faa91e6f4da9748 1.3 GiB   windows/amd64                                      io.cri-containerd.image=managed
sha256:0eefee3e1c2a2f8ecfffb8afd0f195c46ce32f529413ba7dcdd77d952485a8e3                                                                application/vnd.docker.distribution.manifest.list.v2+json sha256:bcfd6ecd44ec004e534e419c8643fec68cc39f623b90fcb2915e7359fd67503d 122.7 MiB windows/amd64                                      io.cri-containerd.image=managed
sha256:17272de21ef1671302ef6ea29b778ad93c937cbef3af341a4e67fc267e85f618                                                                application/vnd.docker.distribution.manifest.v2+json      sha256:7f2c567005290a8bdec11cd3390fdfbb4678e4b1702a37834443ead38aa51c90 5.0 GiB   windows/amd64                                      io.cri-containerd.image=managed
sha256:1fb25eb608ab558cf66b7546fd598b2dc81b6678a380fcdb188da780e107f4ab                                                                application/vnd.docker.distribution.manifest.list.v2+json sha256:dfd3ce22e4b6e987ff2bfb3efe5e4912512fce35660be2ae5faa91e6f4da9748 1.3 GiB   windows/amd64                                      io.cri-containerd.image=managed
sha256:2ad9c4acb5e22ee4d821154e9084462a04a1692a547bcdfe90776fb29521e8fa                                                                application/vnd.docker.distribution.manifest.v2+json      sha256:62a73f18ac0d032a29a0c631cb84a7170f7a45ec5d4536fca8d48ce0a74fe5df 6.6 GiB   windows/amd64                                      io.cri-containerd.image=managed
sha256:2b7a693b9ca972a81d2d11d1808a36768f028b32e22fbab0abe38ff3ba2e817a                                                                application/vnd.docker.distribution.manifest.list.v2+json sha256:b4749cbcc5ce6ae95952e61802a3279db4db5d817dd77e0419ae14d992ec7774 1.7 GiB   windows/amd64                                      io.cri-containerd.image=managed
sha256:3df60b12bc9ab912d0000ce0f69195a4365946c307977210b0c580cc8a64f74f                                                                application/vnd.docker.distribution.manifest.list.v2+json sha256:a2ebb454e148776776e389cd92258bc50c7ad42a9506b29834fd1eb3cfa89cfb 125.4 MiB linux/amd64,linux/arm64,windows/amd64              io.cri-containerd.image=managed
sha256:410586eee635abf4ef049987d521b795ee300fcbf7e18b2e406f0c7b843d5bbc                                                                application/vnd.docker.distribution.manifest.v2+json      sha256:a0e465d27c6f8a447489f3e95cd73ec13808f792e8db5262d903f90c718270e8 7.2 GiB   windows/amd64                                      io.cri-containerd.image=managed
sha256:57d1cd7f5e0796e35af60688b0cae5aed8c4b587f2136ac194d54a23c88a073d                                                                application/vnd.docker.distribution.manifest.list.v2+json sha256:e01f5dae19d7e1be536606fe5deb893417429486b628b816d80ffa0e441eeae8 119.1 MiB linux/amd64,linux/arm64,windows/amd64              io.cri-containerd.image=managed
sha256:8083b23524e91128270f7656d4234a4cfd724f897b4dbc8ab628c5909e20abf0                                                                application/vnd.docker.distribution.manifest.list.v2+json sha256:3dc9066f6d86fd70a801d68be9231118a27a0f19385107f8d73e0be65665f0fe 126.5 MiB linux/amd64,linux/arm/v7,linux/arm64,windows/amd64 io.cri-containerd.image=managed
sha256:8b04da406952a938276bdf442e7b8d747ff03e76d2238fa755751cff7cc6f18c                                                                application/vnd.docker.distribution.manifest.v2+json      sha256:ae180cc2de87bf8597b545980bcd1c08746bf2e3d3336905b6801e996fbb9c33 5.2 GiB   windows/amd64                                      io.cri-containerd.image=managed
sha256:a5e598835d990da5241025a14761eee98f7ae2869e32c353a8fc81ec94952731                                                                application/vnd.docker.distribution.manifest.v2+json      sha256:efc0b996825ee7d441e0b3698aa73b84530a83628a99edca60396a34ec7e50ff 7.2 GiB   windows/amd64                                      io.cri-containerd.image=managed
sha256:a9c78fdbfe584abdeb563ad0bf14a72c3124f0415bf1ece403cbb06944c746c8                                                                application/vnd.docker.distribution.manifest.v2+json      sha256:2f1cb496944926cae35e4ed724d4e6f01ffe3a223fd30d2ac23882ca3a732819 6.9 GiB   windows/amd64                                      io.cri-containerd.image=managed
sha256:ab6b2955d60afd3c9e79a9787a0ce6e69d1ff3a3f32f704ded3c307c4b29c746                                                                application/vnd.docker.distribution.manifest.list.v2+json sha256:b4248836dea118c48481b821d9fbb40daf7e207b48f5e6edc9da6837050b0f65 112.8 MiB windows/amd64                                      io.cri-containerd.image=managed
sha256:d19dc166e4915769fb45eb7f6e5e78e88f6d324d385dd13e75a50b060e6aab86                                                                application/vnd.docker.distribution.manifest.list.v2+json sha256:468f64b34e9598c3c37a44ba111ee30db61936e9a17dd73bda17b38ad85fb874 136.8 MiB linux/amd64,linux/arm64,windows/amd64              io.cri-containerd.image=managed
sha256:d3cd939792b08f4e324b5964ff987f6efd7ae0c177d5dc6e02be49be0b0cb106                                                                application/vnd.docker.distribution.manifest.v2+json      sha256:88f8829c79bf99cab47b8d6c045e4088b529b0530060a74a5aa977a8d69bdd08 4.9 GiB   windows/amd64                                      io.cri-containerd.image=managed
sha256:f708db846088aa2b1c1c9d85546dd1fdc382a8063b096249b02d8e75fa91c463                                                                application/vnd.docker.distribution.manifest.v2+json      sha256:08b1b4d24c9d060bd550e1d4d22ec2e60d8f2e7585865e060948ca0f6b9c2e58 6.9 GiB   windows/amd64                                      io.cri-containerd.image=managed
sha256:f7312d5d67fe06fcadda4280531f5c253cf563895bc381e18520bfa345fd3ae9                                                                application/vnd.docker.distribution.manifest.list.v2+json sha256:348b2d4eebc8da38687755a69b6c21035be232325a6bcde54e5ec4e04689fd93 120.0 MiB linux/amd64,linux/arm64,windows/amd64              io.cri-containerd.image=managed

crictl imagefsinfo also tells that images are only using 92 GiB:

crictl imagefsinfo
{
  "status": {
    "timestamp": "1677229218700304300",
    "fsId": {
      "mountpoint": "C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows"
    },
    "usedBytes": {
      "value": "99545865247"
    },
    "inodesUsed": {
      "value": "0"
    }
  }
}

But disk usage shows 406 GiB usage:

.\du.exe -accepteula -l 1 C:\ProgramData\containerd\root\

DU v1.62 - Directory disk usage reporter
Copyright (C) 2005-2018 Mark Russinovich
Sysinternals - www.sysinternals.com

  20,427,741  c:\programdata\containerd\root\io.containerd.content.v1.content
           3  c:\programdata\containerd\root\io.containerd.grpc.v1.cri
           0  c:\programdata\containerd\root\io.containerd.grpc.v1.introspection
       8,192  c:\programdata\containerd\root\io.containerd.metadata.v1.bolt
           0  c:\programdata\containerd\root\io.containerd.runtime.v2.task
 426,194,415  c:\programdata\containerd\root\io.containerd.snapshotter.v1.windows
           0  c:\programdata\containerd\root\io.containerd.snapshotter.v1.windows-lcow
           0  c:\programdata\containerd\root\opt
Files:        734107
Directories:  163406
Size:         457,349,480,536 bytes
Size on disk: 461,251,911,680 bytes

Seems like the reference of the images is deleted but not the actual files. Is there any way to get the sha’s of the snapshots so I can compare them to the sha`s in the GC log that were said to be deleted?

kubectl describe node akswin2000003 shows GC failures:

Events:
  Type     Reason               Age                 From     Message
  ----     ------               ----                ----     -------
  Warning  FreeDiskSpaceFailed  75m                 kubelet  failed to garbage collect required amount of images. Wanted to free 51017768960 bytes, but freed 307861950 bytes
  Warning  FreeDiskSpaceFailed  70m                 kubelet  failed to garbage collect required amount of images. Wanted to free 51029954560 bytes, but freed 0 bytes
  Warning  ImageGCFailed        70m                 kubelet  failed to garbage collect required amount of images. Wanted to free 51029954560 bytes, but freed 0 bytes
  Warning  FreeDiskSpaceFailed  65m                 kubelet  failed to garbage collect required amount of images. Wanted to free 51037683712 bytes, but freed 307861950 bytes
  Warning  FreeDiskSpaceFailed  60m                 kubelet  failed to garbage collect required amount of images. Wanted to free 51048828928 bytes, but freed 0 bytes
  Warning  ImageGCFailed        60m                 kubelet  failed to garbage collect required amount of images. Wanted to free 51048828928 bytes, but freed 0 bytes
  Warning  FreeDiskSpaceFailed  55m                 kubelet  failed to garbage collect required amount of images. Wanted to free 51056906240 bytes, but freed 307861950 bytes
  Warning  ImageGCFailed        55m                 kubelet  failed to garbage collect required amount of images. Wanted to free 51056906240 bytes, but freed 307861950 bytes
  Warning  FreeDiskSpaceFailed  50m                 kubelet  failed to garbage collect required amount of images. Wanted to free 51068497920 bytes, but freed 0 bytes
  Warning  FreeDiskSpaceFailed  48s (x10 over 45m)  kubelet  (combined from similar events): failed to garbage collect required amount of images. Wanted to free 51157430272 bytes, but freed 0 bytes
  Warning  ImageGCFailed        48s (x9 over 40m)   kubelet  (combined from similar events): failed to garbage collect required amount of images. Wanted to free 51157430272 bytes, but freed 0 bytes

But it doesn’t show disk pressure:

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 24 Feb 2023 10:25:23 +0100   Tue, 07 Feb 2023 20:20:07 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 24 Feb 2023 10:25:23 +0100   Tue, 07 Feb 2023 20:20:07 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 24 Feb 2023 10:25:23 +0100   Tue, 07 Feb 2023 20:20:07 +0100   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Fri, 24 Feb 2023 10:25:23 +0100   Tue, 07 Feb 2023 20:20:17 +0100   KubeletReady                 kubelet is posting ready status`

Similar issue (without solution): https://github.com/rancher/rke2/issues/3090#issue-1278133402