serving: ResourceQuota error isn't reflected in kservice status

In what area(s)?

/area API

What version of Knative?

HEAD

Expected Behavior

Default namespace contains a LimitRange that limits defaultRequest CPU to 100m. Created a ResourceQuota in the same namespace with CPU quota set to 50m. Tried to serve requests to an app deployed in the same namespace. Expected to see an error message when running kubectl get kservice or kubectl get pods saying that there was a failure since the resourcequota was exceeded.

Actual Behavior

Cannot hit the service (loading is stuck). kubectl get kservice shows the app as Ready, with no mention of the quota error in the status. No mention of pod creation failure. Only digging further down and looking at the yaml of the deployment shows the error.

Status of kservice:

status:
  address:
    hostname: testapp.default.svc.cluster.local
    url: http://testapp.default.svc.cluster.local
  conditions:
  - lastTransitionTime: 2019-05-09T23:14:18Z
    status: "True"
    type: ConfigurationsReady
  - lastTransitionTime: 2019-06-17T17:12:57Z
    message: build cannot be migrated forward.
    reason: build
    severity: Warning
    status: "False"
    type: Convertible
  - lastTransitionTime: 2019-05-09T23:14:19Z
    status: "True"
    type: Ready
  - lastTransitionTime: 2019-05-09T23:14:19Z
    status: "True"
    type: RoutesReady
  domain: testapp.default.example.com
  domainInternal: testapp.default.svc.cluster.local
  latestCreatedRevisionName: testapp-ncngm
  latestReadyRevisionName: testapp-ncngm
  observedGeneration: 1
  traffic:
  - latestRevision: true
    percent: 100
    revisionName: testapp-ncngm
  url: http://testapp.default.example.com

Status of deployment:

status:
  conditions:
  - lastTransitionTime: 2019-05-09T23:14:08Z
    lastUpdateTime: 2019-06-17T17:13:02Z
    message: ReplicaSet "testapp-ncngm-deployment-6cbf59d7b9" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: 2019-06-18T22:06:36Z
    lastUpdateTime: 2019-06-18T22:06:36Z
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: 2019-06-18T22:06:36Z
    lastUpdateTime: 2019-06-18T22:06:36Z
    message: 'pods "testapp-ncngm-deployment-6cbf59d7b9-cjrjl" is forbidden: exceeded
      quota: new-cpu-quota, requested: cpu=225m, used: cpu=200m, limited: cpu=50m'
    reason: FailedCreate
    status: "True"
    type: ReplicaFailure
  observedGeneration: 6
  unavailableReplicas: 1

Steps to Reproduce the Problem

  1. Create a limit range in a namespace, setting the default CPU (or any resource) to a value.
  2. Create ResourceQuota in the same namespace, setting the quota for the resource to a smaller value than the default.
  3. Try to serve requests from an app in the same namespace.

I believe the issue might be related to how the deployment is being reconciled. It looks like there is an “Error getting pods” message that is getting logged but the status of the revision/kservice do not get updated. Also the logic is checking that deployment.Status.AvailableReplicas == 0, which might not match all cases where pod creation has failed (for example, if 2 replicas have already been created, and the 3rd replica exceeds the ResourceQuota limit). Would it be possible to use the UnavailableReplicas value in the deployment instead?

Code for reference: https://github.com/knative/serving/blob/master//pkg/reconciler/revision/reconcile_resources.go#L36:22

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 3
  • Comments: 31 (26 by maintainers)

Most upvoted comments

Checking on what the exact fix was… not showing in v1.5, so it was something in the last release

I started looking at this (and other related items yesterday)

Thanks for the bug report. This looks like something that we should bubble up into the Service status.

Added API label and moved into Serving 0.8.