origin: Warning FailedScheduling 0/1 nodes are available: 1 NodeUnderDiskPressure.

I don’t know if it’s a legitimate warning or something broken, but we’re seeing a lot of test fail because pods don’t get scheduled due to this error.

So either we need more disk space on our instances, or we need to see why the nodes are confused about their capacity.

Name:		test-pod-fbb1307f-ca8c-11e7-9fec-0ee2507ca112
Namespace:	extended-test-s2i-usage-hmpwm-sdggx
Node:		ip-172-18-5-103.ec2.internal/172.18.5.103
Start Time:	Thu, 16 Nov 2017 05:19:21 +0000
Labels:		name=test-pod-fbb1307f-ca8c-11e7-9fec-0ee2507ca112
Annotations:	openshift.io/scc=restricted
Status:		Pending
IP:		
Containers:
  test:
    Container ID:	
    Image:		openshift/perl-516-centos7
    Image ID:		
    Port:		<none>
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Environment:	<none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-sjl4r (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  default-token-sjl4r:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-sjl4r
    Optional:	false
QoS Class:	BestEffort
Node-Selectors:	region=infra
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From					SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----					-------------	--------	------			-------
  15m		10m		21	default-scheduler					Warning		FailedScheduling	0/1 nodes are available: 1 NodeUnderDiskPressure.
  9m		9m		1	default-scheduler					Normal		Scheduled		Successfully assigned test-pod-fbb1307f-ca8c-11e7-9fec-0ee2507ca112 to ip-172-18-5-103.ec2.internal
  9m		9m		1	kubelet, ip-172-18-5-103.ec2.internal			Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-sjl4r" 
  9m		4m		25	kubelet, ip-172-18-5-103.ec2.internal			Warning		FailedCreatePodSandBox	Failed create pod sandbox.

@stevekuznetsov @smarterclayton @sjenning

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 3
  • Comments: 15 (15 by maintainers)

Most upvoted comments

There are a few flags in the kubelet that handle this

https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource

--eviction-soft
--eviction-soft-grace-period
--eviction-hard
--eviction-minimum-reclaim

Basically if the hard threshold is exceed OR the soft threshold is exceeded for more than a grace-period duration, eviction occurs and node conditions are set.

eviction-minimum-reclaim controls the “chunk” the will attempt to be reclaimed once a threshold is exceeded.

minimum-image-ttl-duration might be excluding all images from GC consideration if they are being pulled in a short period of time.

There should be messages in the node log if it is unable to free resources in response to reaching a threshold. Something like “attempted to free X, but could only free Y”

@derekwaynecarr can you check my understanding here