client: Make error-window when waiting for a service to be ready configurable

The use case described below in this description would be supported by a new option as described below


/kind question

we are running with cluster-autoscaler, and https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-configure-overprovisioning-with-cluster-autoscaler to put some low-priority pause pod in the cluster. When the worker node cpu/memory are not nearly 99% (30% of them are occupied by the pause pods), we create a knative service with kn, and got error:

kn service create hello --image xxx --wait-timeout 300 --env TARGET=revision1 
Creating service 'hello' in namespace:
  0.380s The Route is still working to reflect the latest desired specification.
  1.253s Configuration "hello" is waiting for a Revision to become ready.
  4.249s Revision "hello-xxx" failed with message: 0/15 nodes are available: 1 Insufficient memory, 14 Insufficient cpu..
  5.077s Configuration "hello" does not have any ready Revision.
Error: RevisionFailed: Revision "hello-xxx" failed with message: 0/15 nodes are available: 1 Insufficient memory, 14 Insufficient cpu..

I checked with k8s scheduler team that the pod schedule will happen in 2 stage, the 1st pod placement attempt failed and the scheduler preempted low-priority Pods; then the 2nd pod placement attempt succeed. So as a result , the final knative service reconcile succeed.

But when using kn client, the end-user got the scary failed msg … If the end-user don’t have enough knowledge for the k8s reconcile, he/she will be frightened.

Another case is from some race condition case in knative itself. refer to : https://github.com/knative/serving/issues/8675 When the error is thrown out from kn client, the ksvc just created for 4 seconds. Later on, with more reconcile, the ksvc is finally ready.

So, I am wondering whether there are a better idea for watch to reduce these intermittent errors since reconcile is a designed behaviour of k8s. maybe just adding a shorter wait time to see whether any condition change for the next reconcile?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 17 (8 by maintainers)

Commits related to this issue

Most upvoted comments

the concept is hard to explain in one word anyway. It should start with --wait so that it aligns with the other wait option (--wait-timeout), so I would be fine with --wait-window and explaining it in the help message. It also a balancing act between being precise and too verbose (which leads to more typing and harder to memorize). Also, we already have the concept of a “window” included with --autoscale-window (which actually should be named --scale-window like the other autoscale parameters), so I would be fine with a --wait-window.