origin: Warning FailedScheduling 0/1 nodes are available: 1 NodeUnderDiskPressure.
I don’t know if it’s a legitimate warning or something broken, but we’re seeing a lot of test fail because pods don’t get scheduled due to this error.
So either we need more disk space on our instances, or we need to see why the nodes are confused about their capacity.
Name: test-pod-fbb1307f-ca8c-11e7-9fec-0ee2507ca112
Namespace: extended-test-s2i-usage-hmpwm-sdggx
Node: ip-172-18-5-103.ec2.internal/172.18.5.103
Start Time: Thu, 16 Nov 2017 05:19:21 +0000
Labels: name=test-pod-fbb1307f-ca8c-11e7-9fec-0ee2507ca112
Annotations: openshift.io/scc=restricted
Status: Pending
IP:
Containers:
test:
Container ID:
Image: openshift/perl-516-centos7
Image ID:
Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-sjl4r (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-sjl4r:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-sjl4r
Optional: false
QoS Class: BestEffort
Node-Selectors: region=infra
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
15m 10m 21 default-scheduler Warning FailedScheduling 0/1 nodes are available: 1 NodeUnderDiskPressure.
9m 9m 1 default-scheduler Normal Scheduled Successfully assigned test-pod-fbb1307f-ca8c-11e7-9fec-0ee2507ca112 to ip-172-18-5-103.ec2.internal
9m 9m 1 kubelet, ip-172-18-5-103.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-sjl4r"
9m 4m 25 kubelet, ip-172-18-5-103.ec2.internal Warning FailedCreatePodSandBox Failed create pod sandbox.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 3
- Comments: 15 (15 by maintainers)
There are a few flags in the kubelet that handle this
https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource
Basically if the hard threshold is exceed OR the soft threshold is exceeded for more than a grace-period duration, eviction occurs and node conditions are set.
eviction-minimum-reclaimcontrols the “chunk” the will attempt to be reclaimed once a threshold is exceeded.minimum-image-ttl-durationmight be excluding all images from GC consideration if they are being pulled in a short period of time.There should be messages in the node log if it is unable to free resources in response to reaching a threshold. Something like “attempted to free X, but could only free Y”
@derekwaynecarr can you check my understanding here