kyma: Knative Services exposed via Istio return 503 after being scaled down to 0

Description Sometimes the knative service exposed via istio envoy is returning 503 upstream connect error or disconnect/reset before headers. reset reason: connection failure%. This happens occasionally even when the function pod is up and running.

Expected result The function should return properly

Actual result

* Rebuilt URL to: http://sample.serverless-system.34.76.171.216.xip.io/
*   Trying 34.76.171.216...
* TCP_NODELAY set
* Connected to sample.serverless-system.34.76.171.216.xip.io (34.76.171.216) port 80 (#0)
> GET / HTTP/1.1
> Host: sample.serverless-system.34.76.171.216.xip.io
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< location: https://sample.serverless-system.34.76.171.216.xip.io/
< date: Mon, 19 Aug 2019 13:13:07 GMT
< server: istio-envoy
< content-length: 0
<
* Connection #0 to host sample.serverless-system.34.76.171.216.xip.io left intact
* Issue another request to this URL: 'https://sample.serverless-system.34.76.171.216.xip.io/'
*   Trying 34.76.171.216...
* TCP_NODELAY set
* Connected to sample.serverless-system.34.76.171.216.xip.io (34.76.171.216) port 443 (#1)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=34.76.171.216.xip.io
*  start date: Aug  6 13:12:30 2019 GMT
*  expire date: Sep  5 13:12:30 2019 GMT
*  issuer: CN=34.76.171.216.xip.io
*  SSL certificate verify result: self signed certificate (18), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fe05a00d600)
> GET / HTTP/2
> Host: sample.serverless-system.34.76.171.216.xip.io
> User-Agent: curl/7.54.0
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 503
< content-length: 91
< content-type: text/plain
< date: Mon, 19 Aug 2019 13:13:07 GMT
< server: istio-envoy
<
* Connection #1 to host sample.serverless-system.34.76.171.216.xip.io left intact
upstream connect error or disconnect/reset before headers. reset reason: connection failure%

Workaround

Restart the ingress-gateway pod

kubectl delete pod -l app=istio-ingressgateway -n istio-system

Steps to reproduce

Deploy function-controller in serverless-controller namespace. apply following

apiVersion: serverless.kyma-project.io/v1alpha1
kind: Function
metadata:
  name: sample
  labels:
    foo: bar
spec:
  function: |
    module.exports = {
        main: function(event, context) {
          return 'Hello World'
        }
      }
  functionContentType: "plaintext"
  size: "L"
  runtime: "nodejs8"
EOF

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 17 (9 by maintainers)

Most upvoted comments

We are also seeing the 503s with the activator complaining about error roundtripping http://x.x.x.x:80/healthz: context deadline exceeded". At this point we see no errors in the istio-ingressgateway and no other errors apart from 1 leaderelection.go:360] Failed to update lock: Operation cannot be fulfilled on leases.coordination.k8s.io "autoscaler-bucket-00-of-01": the object has been modified; please apply your changes to the latest version and try again in the autoscaler. Any leads on how we can dig further?

It appears that these issues may be known issues in knative serving. It looks like the 0.8.x releases fix a lot of 503 related connection issues:

https://github.com/knative/serving/issues/4752

https://github.com/knative/serving/issues/4281