ingress-nginx: http 504 response from AWS ELB when using ingress-nginx-controller

NGINX Ingress controller version

nginx-ingress-controller:0.33.0

Environment

  • AWS EKS v.1.14
  • EC2 instances with running nginx-ingress-controllers (3 EC2 instances, 2 controllers)
  • Classical AWS load balancer in front of nginx-ingress-controllers

What happened

The client fails to send whole http request in given time (AWS ELB idle timeout), nginx-ingress-controller returns 400, which causes ELB to return 504. The issue happens rarely, ~99% of the requests are successful. Having request chain “ELB -> nginx-ingress-controller -> application”, we don’t see that in failed cases the request reaches the application (request chain stops at nginx-ingress-controller). Application at that time didn’t show any signs of loads or garbage collector activity, from all the graphics it was completely healthy. There are no restarts of nginx-ingress-controller or application when the issue happens.

Example of failed request

Nginx-ingress-controller response

{
  "body_bytes_sent": 0,
  "bytes_sent": 0,
  "connection": 562223,
  "connection_requests": 28,
  "duration": 54.5,
  "http": {
    "method": "POST",
    "status_code": 400,
    "url": "/locations/4a56c371-e819-4a85-aec5-c0076474a6fd/positions",
    "url_details": {
      "path": "/locations/4a56c371-e819-4a85-aec5-c0076474a6fd/positions"
    },
    "version": 1.1
  },
  "http_referrer": "",
  "http_user_agent": "RestSharp/106.11.4.0",
  "level": "info",
  "origin": "",
  "remote_addr": "1.125.0.0",
  "remote_user": "",
  "request": "POST /locations/4a56c371-e819-4a85-aec5-c0076474a6fd/positions HTTP/1.1",
  "request_length": 1420,
  "request_time": 54.500,
  "server_name": "api.opensc.org",
  "status": 400,
  "time_local": "16/Sep/2020:09:45:35 +0000",
  "upstream_connect_time": "",
  "upstream_header_time": "",
  "upstream_response_time": ""
}

ELB response

{
  "timestamp": "2020-09-16T09:44:40.849658Z",
  "elb_name": "adaf41a991cee11eabeb7062e7d1b817",
  "request_ip": "1.125.106.64",
  "request_port": "44182",
  "backend_ip": "",
  "backend_port": "",
  "request_processing_time": "-1.0",
  "backend_processing_time": "-1.0",
  "client_response_time": "-1.0",
  "elb_response_code": "504",
  "backend_response_code": "0",
  "received_bytes": "0",
  "sent_bytes": "0",
  "request_verb": "POST",
  "url": "https://api.opensc.org:443/locations/4a56c371-e819-4a85-aec5-c0076474a6fd/positions",
  "protocol": "HTTP/1.1",
  "user_agent": "RestSharp/106.11.4.0",
  "ssl_cipher": "ECDHE-RSA-AES128-GCM-SHA256",
  "ssl_protocol": "TLSv1.2"
}

Nginx-ingress-controller reported time = 16/Sep/2020:09:45:35 +0000 AWS ELB reported time = 2020-09-16T09:44:40.849658Z The time difference is 55s, which is equal to AWS ELB idle timeout.

Why we think it is client issue According to AWS docs (https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/ts-elb-error-message.html#ts-elb-errorcodes-http504, “Solution 2”), ELB idle timeout should be less than keep alive timeout on the server (nginx-ingress-controller in our case). As a test, we set:

  • ELB timeout to be 65s
  • client_header_timeout 60s;
  • client_body_timeout 60s; In this case nginx-ingress-controller started to return 408 REQUEST TIMEOUT after 60s, but ELB still returned 504 for this request. Also for all successful requests of this kind, we see the client request to be ~1602 bytes, but for the failed requests it is less (1420 bytes from log example). When Nginx triggers client_body_timeout, it logs 408 but in fact it just closes the connection (see https://trac.nginx.org/nginx/ticket/1005), so from ELB perspective it seems that the node just closed the connection, that’s why 504. This proved that the client didn’t send the whole http request in given time.

Anything else we need to know

Our current setting are:

  • AWS ELB idle timeout is 55s
  • Nginx-ingress-controller timeouts: – keepalive_timeout is 75s; – client_header_timeout is 75s; – client_body_timeout is 75s;
  • We enabled nginx-ingress-controller debug logs, but they were to cryptic for us
  • According to AWS (https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/ts-elb-error-message.html#ts-elb-errorcodes-http504), there are 2 reasons of AWS ELB 504: – “The application takes longer to respond than the configured idle timeout.” which seems to be not our case, as we wait for client transmission of the request – “Registered instances closing the connection to Elastic Load Balancing.” which seems to be the case for us.

We are not sure how to proceed or what timeout configs we need to tune to eliminate the issue.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 29 (12 by maintainers)

Most upvoted comments

I tested on a 2 node m5.large eks 1.22 cluster with an ALB, Classic, and NLB on ingress-nginx v1.2.1 with one controller.

I did not see any 5XXs on my testing. If you have any other input params for the testing please let me know. Tracking down .01 or .1 % issues can be difficult.

  • What is the size of data being uploaded to the servers?
  • Any other configuration options to test?

Image testing: kennethreitz/httpbin. http://httpbin.org/

Aws install per docs: https://kubernetes.github.io/ingress-nginx/deploy/#aws

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.2.1/deploy/static/provider/aws/deploy.yaml

Configuration

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: aws-ingress
spec:
  rules:
    - http:
        paths:
          - path: /
            pathType: Exact
            backend:
              service:
                name: aws-test
                port:
                  number: 80
---
apiVersion: v1
kind: Service
metadata:
  name: aws-test
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
  type: LoadBalancer
  selector:
    app: app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: aws-nlb
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
  type: LoadBalancer
  selector:
    app: app-nlb
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: aws-alb
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-type: alb
    alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=55
spec:
  type: LoadBalancer
  selector:
    app: app-alb
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app
  template:
    metadata:
      labels:
        app: app
    spec:
      containers:
        - name: http
          image: kennethreitz/httpbin
          ports:
            - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment-nlb
  labels:
    app: app-nlb
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app-nlb
  template:
    metadata:
      labels:
        app: app-nlb
    spec:
      containers:
        - name: http
          image: kennethreitz/httpbin
          ports:
            - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment-alb
  labels:
    app: app-alb
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app-alb
  template:
    metadata:
      labels:
        app: app-alb
    spec:
      containers:
        - name: http
          image: kennethreitz/httpbin
          ports:
            - containerPort: 80

ingress-nginx configmap

 k describe cm ingress-nginx-controller -n ingress-nginx
Name:         ingress-nginx-controller
Namespace:    ingress-nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.2.1
Annotations:  <none>

Data
====
client-body-timeout:
----
190
client-header-timeout:
----
190
allow-snippet-annotations:
----
true

BinaryData
====

Events:
  Type    Reason  Age   From                      Message
  ----    ------  ----  ----                      -------
  Normal  CREATE  57m   nginx-ingress-controller  ConfigMap ingress-nginx/ingress-nginx-controller
  Normal  UPDATE  54m   nginx-ingress-controller  ConfigMap ingress-nginx/ingress-nginx-controller
  Normal  CREATE  52m   nginx-ingress-controller  ConfigMap ingress-nginx/ingress-nginx-controller

k6 test script example

import http from "k6/http";

export let options = {
    vus: 5,
    stages: [
        { duration: "3m", target: 100 },
        { duration: "5m", target: 200 },
        { duration: "10m", target: 1000 },
    ]
};

export default function() {
    let response = http.get("http://a450396b1f32147e083f68e4f248ab07-1813932356.us-west-2.elb.amazonaws.com");
};

AWS Classic HTTP GET /

k6 run aws-test.js 

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: aws-test.js
     output: -

  scenarios: (100.00%) 1 scenario, 1000 max VUs, 18m30s max duration (incl. graceful stop):
           * default: Up to 1000 looping VUs for 18m0s over 3 stages (gracefulRampDown: 30s, gracefulStop: 30s)


running (18m02.8s), 0000/1000 VUs, 520091 complete and 0 interrupted iterations
default ✓ [======================================] 0000/1000 VUs  18m0s

     data_received..................: 5.1 GB 4.7 MB/s
     data_sent......................: 71 MB  66 kB/s
     http_req_blocked...............: avg=140.96µs min=0s      med=1µs      max=117.46ms p(90)=3µs   p(95)=3µs  
     http_req_connecting............: avg=137.75µs min=0s      med=0s       max=97.39ms  p(90)=0s    p(95)=0s   
     http_req_duration..............: avg=797.91ms min=66.81ms med=301.17ms max=5.86s    p(90)=2.21s p(95)=2.69s
       { expected_response:true }...: avg=797.91ms min=66.81ms med=301.17ms max=5.86s    p(90)=2.21s p(95)=2.69s
     http_req_failed................: 0.00%  ✓ 0          ✗ 520091
     http_req_receiving.............: avg=331.72µs min=11µs    med=259µs    max=73.65ms  p(90)=484µs p(95)=649µs
     http_req_sending...............: avg=55.36µs  min=1µs     med=4µs      max=4.07s    p(90)=14µs  p(95)=18µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s       p(90)=0s    p(95)=0s   
     http_req_waiting...............: avg=797.53ms min=66.59ms med=300.83ms max=5.85s    p(90)=2.21s p(95)=2.69s
     http_reqs......................: 520091 480.336335/s
     iteration_duration.............: avg=798.1ms  min=66.87ms med=301.2ms  max=5.93s    p(90)=2.21s p(95)=2.69s
     iterations.....................: 520091 480.336335/s
     vus............................: 225    min=5        max=999 
     vus_max........................: 1000   min=1000     max=1000

AWS ALB HTTP GET /

  k6 run aws-test-alb.js 

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: aws-test-alb.js
     output: -

  scenarios: (100.00%) 1 scenario, 1000 max VUs, 18m30s max duration (incl. graceful stop):
           * default: Up to 1000 looping VUs for 18m0s over 3 stages (gracefulRampDown: 30s, gracefulStop: 30s)

WARN[0000] Error from API server                         error="listen tcp 127.0.0.1:6565: bind: address already in use"

running (18m01.8s), 0000/1000 VUs, 439908 complete and 0 interrupted iterations
default ✓ [======================================] 0000/1000 VUs  18m0s

     data_received..................: 4.3 GB 4.0 MB/s
     data_sent......................: 60 MB  56 kB/s
     http_req_blocked...............: avg=165.08µs min=0s      med=1µs   max=110.43ms p(90)=3µs   p(95)=3µs  
     http_req_connecting............: avg=162.44µs min=0s      med=0s    max=89.29ms  p(90)=0s    p(95)=0s   
     http_req_duration..............: avg=942.59ms min=68.58ms med=1s    max=4.2s     p(90)=1.68s p(95)=1.77s
       { expected_response:true }...: avg=942.59ms min=68.58ms med=1s    max=4.2s     p(90)=1.68s p(95)=1.77s
     http_req_failed................: 0.00%  ✓ 0          ✗ 439908
     http_req_receiving.............: avg=373.25µs min=13µs    med=256µs max=77.91ms  p(90)=557µs p(95)=847µs
     http_req_sending...............: avg=9.76µs   min=1µs     med=5µs   max=20.72ms  p(90)=14µs  p(95)=18µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s    max=0s       p(90)=0s    p(95)=0s   
     http_req_waiting...............: avg=942.21ms min=68.27ms med=1s    max=4.2s     p(90)=1.68s p(95)=1.77s
     http_reqs......................: 439908 406.656911/s
     iteration_duration.............: avg=942.9ms  min=68.59ms med=1s    max=4.2s     p(90)=1.68s p(95)=1.77s
     iterations.....................: 439908 406.656911/s
     vus............................: 471    min=5        max=999 
     vus_max........................: 1000   min=1000     max=1000

AWS NLB HTTP GET /

  k6 run aws-test-nlb.js 
          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io
  execution: local
     script: aws-test-nlb.js
     output: -
  scenarios: (100.00%) 1 scenario, 1000 max VUs, 18m30s max duration (incl. graceful stop):

running (18m01.1s), 0000/1000 VUs, 445337 complete and 0 interrupted iterations
default ✓ [======================================] 0000/1000 VUs  18m0s

     data_received..................: 4.4 GB 4.1 MB/s
     data_sent......................: 64 MB  59 kB/s
     http_req_blocked...............: avg=165.56µs min=0s      med=1µs   max=114ms    p(90)=3µs   p(95)=3µs  
     http_req_connecting............: avg=163.11µs min=0s      med=0s    max=113.96ms p(90)=0s    p(95)=0s   
     http_req_duration..............: avg=930.41ms min=67.71ms med=1.01s max=4.15s    p(90)=1.66s p(95)=1.77s
       { expected_response:true }...: avg=930.41ms min=67.71ms med=1.01s max=4.15s    p(90)=1.66s p(95)=1.77s
     http_req_failed................: 0.00%  ✓ 0          ✗ 445337
     http_req_receiving.............: avg=275.06µs min=13µs    med=206µs max=32.51ms  p(90)=399µs p(95)=547µs
     http_req_sending...............: avg=9.66µs   min=1µs     med=5µs   max=24.12ms  p(90)=14µs  p(95)=18µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s    max=0s       p(90)=0s    p(95)=0s   
     http_req_waiting...............: avg=930.13ms min=67.38ms med=1.01s max=4.15s    p(90)=1.66s p(95)=1.76s
     http_reqs......................: 445337 411.920986/s
     iteration_duration.............: avg=930.72ms min=67.74ms med=1.02s max=4.15s    p(90)=1.66s p(95)=1.77s
     iterations.....................: 445337 411.920986/s
     vus............................: 188    min=5        max=999 
     vus_max........................: 1000   min=1000     max=1000

AWS Classic HTTP GET /bytes/2048

 k6 run aws-test.js 

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: aws-test.js
     output: -

  scenarios: (100.00%) 1 scenario, 1000 max VUs, 18m30s max duration (incl. graceful stop):
           * default: Up to 1000 looping VUs for 18m0s over 3 stages (gracefulRampDown: 30s, gracefulStop: 30s)


running (18m06.6s), 0000/1000 VUs, 142305 complete and 0 interrupted iterations
default ✓ [======================================] 0000/1000 VUs  18m0s

     data_received..................: 326 MB 300 kB/s
     data_sent......................: 21 MB  19 kB/s
     http_req_blocked...............: avg=504.58µs min=0s      med=2µs   max=94.57ms p(90)=3µs   p(95)=4µs  
     http_req_connecting............: avg=500.57µs min=0s      med=0s    max=93.91ms p(90)=0s    p(95)=0s   
     http_req_duration..............: avg=2.93s    min=69.15ms med=2.05s max=10.25s  p(90)=6.55s p(95)=7.48s
       { expected_response:true }...: avg=2.93s    min=69.15ms med=2.05s max=10.25s  p(90)=6.55s p(95)=7.48s
     http_req_failed................: 0.00%  ✓ 0        ✗ 142305
     http_req_receiving.............: avg=112.18µs min=8µs     med=49µs  max=21.72ms p(90)=186µs p(95)=229µs
     http_req_sending...............: avg=11.89µs  min=2µs     med=9µs   max=5.5ms   p(90)=18µs  p(95)=23µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s    max=0s      p(90)=0s    p(95)=0s   
     http_req_waiting...............: avg=2.93s    min=69.12ms med=2.05s max=10.25s  p(90)=6.55s p(95)=7.48s
     http_reqs......................: 142305 130.9586/s
     iteration_duration.............: avg=2.93s    min=69.22ms med=2.05s max=10.33s  p(90)=6.55s p(95)=7.48s
     iterations.....................: 142305 130.9586/s
     vus............................: 138    min=5      max=999 
     vus_max........................: 1000   min=1000   max=1000


AWS ALB HTTP GET /bytes/2048

k6 run aws-test-alb.js 

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: aws-test-alb.js
     output: -

  scenarios: (100.00%) 1 scenario, 1000 max VUs, 18m30s max duration (incl. graceful stop):
           * default: Up to 1000 looping VUs for 18m0s over 3 stages (gracefulRampDown: 30s, gracefulStop: 30s)

WARN[0000] Error from API server                         error="listen tcp 127.0.0.1:6565: bind: address already in use"

running (18m06.9s), 0000/1000 VUs, 143918 complete and 0 interrupted iterations
default ✓ [======================================] 0000/1000 VUs  18m0s

     data_received..................: 329 MB 303 kB/s
     data_sent......................: 21 MB  20 kB/s
     http_req_blocked...............: avg=501.82µs min=0s      med=2µs   max=181.96ms p(90)=3µs   p(95)=4µs  
     http_req_connecting............: avg=498.4µs  min=0s      med=0s    max=181.91ms p(90)=0s    p(95)=0s   
     http_req_duration..............: avg=2.9s     min=68.11ms med=1.86s max=14.7s    p(90)=6.84s p(95)=7.75s
       { expected_response:true }...: avg=2.9s     min=68.11ms med=1.86s max=14.7s    p(90)=6.84s p(95)=7.75s
     http_req_failed................: 0.00%  ✓ 0          ✗ 143918
     http_req_receiving.............: avg=83.33µs  min=9µs     med=32µs  max=21.33ms  p(90)=109µs p(95)=177µs
     http_req_sending...............: avg=12.16µs  min=2µs     med=9µs   max=15.18ms  p(90)=18µs  p(95)=23µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s    max=0s       p(90)=0s    p(95)=0s   
     http_req_waiting...............: avg=2.9s     min=68.05ms med=1.86s max=14.7s    p(90)=6.84s p(95)=7.75s
     http_reqs......................: 143918 132.411916/s
     iteration_duration.............: avg=2.9s     min=68.16ms med=1.86s max=14.77s   p(90)=6.84s p(95)=7.75s
     iterations.....................: 143918 132.411916/s
     vus............................: 186    min=5        max=999 
     vus_max........................: 1000   min=1000     max=1000


AWS NLB HTTP GET /bytes/2048

 k6 run aws-test-nlb.js 

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: aws-test-nlb.js
     output: -

  scenarios: (100.00%) 1 scenario, 1000 max VUs, 18m30s max duration (incl. graceful stop):
           * default: Up to 1000 looping VUs for 18m0s over 3 stages (gracefulRampDown: 30s, gracefulStop: 30s)

WARN[0000] Error from API server                         error="listen tcp 127.0.0.1:6565: bind: address already in use"

running (18m05.0s), 0000/1000 VUs, 139669 complete and 0 interrupted iterations
default ✓ [======================================] 0000/1000 VUs  18m0s

     data_received..................: 319 MB 294 kB/s
     data_sent......................: 21 MB  20 kB/s
     http_req_blocked...............: avg=522.57µs min=0s      med=2µs   max=125.65ms p(90)=3µs   p(95)=4µs  
     http_req_connecting............: avg=519.3µs  min=0s      med=0s    max=125.6ms  p(90)=0s    p(95)=0s   
     http_req_duration..............: avg=2.98s    min=69.88ms med=2.33s max=14.53s   p(90)=6.66s p(95)=7.53s
       { expected_response:true }...: avg=2.98s    min=69.88ms med=2.33s max=14.53s   p(90)=6.66s p(95)=7.53s
     http_req_failed................: 0.00%  ✓ 0          ✗ 139669
     http_req_receiving.............: avg=128.84µs min=8µs     med=69µs  max=20.25ms  p(90)=204µs p(95)=250µs
     http_req_sending...............: avg=12.54µs  min=2µs     med=9µs   max=21.74ms  p(90)=18µs  p(95)=23µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s    max=0s       p(90)=0s    p(95)=0s   
     http_req_waiting...............: avg=2.98s    min=69.77ms med=2.33s max=14.52s   p(90)=6.66s p(95)=7.53s
     http_reqs......................: 139669 128.726895/s
     iteration_duration.............: avg=2.98s    min=69.93ms med=2.33s max=14.53s   p(90)=6.66s p(95)=7.53s
     iterations.....................: 139669 128.726895/s
     vus............................: 18     min=5        max=999 
     vus_max........................: 1000   min=1000     max=1000


Classic 1 failure ALB NLB

AWS Classic HTTP POST 50 MB file

k6 run aws-test.js 

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: aws-test.js
     output: -

  scenarios: (100.00%) 1 scenario, 50 max VUs, 3m30s max duration (incl. graceful stop):
           * default: Up to 50 looping VUs for 3m0s over 1 stages (gracefulRampDown: 30s, gracefulStop: 30s)

WARN[0124] Request Failed                                error="Post \"http://a450396b1f32147e083f68e4f248ab07-1813932356.us-west-2.elb.amazonaws.com/anthing\": write tcp 192.168.1.200:50891->52.12.43.166:80: use of closed network connection"

running (3m01.7s), 00/50 VUs, 14067 complete and 0 interrupted iterations
default ✓ [======================================] 00/50 VUs  3m0s

     data_received..............: 6.5 MB  36 kB/s
     data_sent..................: 2.3 GB  12 MB/s
     http_req_blocked...........: avg=149.94ms min=64.56ms  med=78.08ms  max=1.5s  p(90)=367.55ms p(95)=623.6ms 
     http_req_connecting........: avg=149.55ms min=64.53ms  med=77.98ms  max=1.5s  p(90)=366.7ms  p(95)=621.41ms
     http_req_duration..........: avg=55.87ms  min=0s       med=50.29ms  max=1.14s p(90)=51.53ms  p(95)=55.59ms 
     http_req_failed............: 100.00% ✓ 14067     ✗ 0   
     http_req_receiving.........: avg=55.87ms  min=0s       med=50.29ms  max=1.14s p(90)=51.53ms  p(95)=55.59ms 
     http_req_sending...........: avg=0s       min=0s       med=0s       max=0s    p(90)=0s       p(95)=0s      
     http_req_tls_handshaking...: avg=0s       min=0s       med=0s       max=0s    p(90)=0s       p(95)=0s      
     http_req_waiting...........: avg=0s       min=0s       med=0s       max=0s    p(90)=0s       p(95)=0s      
     http_reqs..................: 14067   77.407045/s
     iteration_duration.........: avg=346.96ms min=184.73ms med=250.45ms max=2.53s p(90)=632.17ms p(95)=901.62ms
     iterations.................: 14067   77.407045/s
     vus........................: 13      min=5       max=49
     vus_max....................: 50      min=50      max=50


AWS ALB HTTP POST 50 MB file

k6 run aws-test-alb.js 

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: aws-test-alb.js
     output: -

  scenarios: (100.00%) 1 scenario, 100 max VUs, 3m30s max duration (incl. graceful stop):
           * default: Up to 100 looping VUs for 3m0s over 1 stages (gracefulRampDown: 30s, gracefulStop: 30s)

WARN[0000] Error from API server                         error="listen tcp 127.0.0.1:6565: bind: address already in use"

running (3m01.0s), 000/100 VUs, 19756 complete and 0 interrupted iterations
default ✓ [======================================] 000/100 VUs  3m0s

     data_received..............: 9.1 MB  51 kB/s
     data_sent..................: 4.3 GB  24 MB/s
     http_req_blocked...........: avg=148.34ms min=64.5ms   med=78.13ms  max=12.19s p(90)=303.75ms p(95)=633.35ms
     http_req_connecting........: avg=147.71ms min=64.45ms  med=78.04ms  max=12.19s p(90)=303.23ms p(95)=629.34ms
     http_req_duration..........: avg=106.72ms min=50.02ms  med=50.26ms  max=4.7s   p(90)=67.46ms  p(95)=302.59ms
     http_req_failed............: 100.00% ✓ 19756      ✗ 0    
     http_req_receiving.........: avg=106.72ms min=50.02ms  med=50.26ms  max=4.7s   p(90)=67.46ms  p(95)=302.59ms
     http_req_sending...........: avg=0s       min=0s       med=0s       max=0s     p(90)=0s       p(95)=0s      
     http_req_tls_handshaking...: avg=0s       min=0s       med=0s       max=0s     p(90)=0s       p(95)=0s      
     http_req_waiting...........: avg=0s       min=0s       med=0s       max=0s     p(90)=0s       p(95)=0s      
     http_reqs..................: 19756   109.169711/s
     iteration_duration.........: avg=475.33ms min=183.49ms med=276.41ms max=12.37s p(90)=997.83ms p(95)=1.5s    
     iterations.................: 19756   109.169711/s
     vus........................: 40      min=5        max=99 
     vus_max....................: 100     min=100      max=100


AWS NLB HTTP POST 50 MB file

 k6 run aws-test-nlb.js 

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: aws-test-nlb.js
     output: -

  scenarios: (100.00%) 1 scenario, 100 max VUs, 3m30s max duration (incl. graceful stop):
           * default: Up to 100 looping VUs for 3m0s over 1 stages (gracefulRampDown: 30s, gracefulStop: 30s)

WARN[0000] Error from API server                         error="listen tcp 127.0.0.1:6565: bind: address already in use"
WARN[0158] Request Failed                                error="Post \"http://a6de82261ec9c43419594071bf44e630-985dadf1a6ea2203.elb.us-west-2.amazonaws.com/anthing\": EOF"
WARN[0158] Request Failed                                error="Post \"http://a6de82261ec9c43419594071bf44e630-985dadf1a6ea2203.elb.us-west-2.amazonaws.com/anthing\": write tcp 192.168.1.200:63734->54.148.23.179:80: use of closed network connection"
WARN[0158] Request Failed                                error="Post \"http://a6de82261ec9c43419594071bf44e630-985dadf1a6ea2203.elb.us-west-2.amazonaws.com/anthing\": EOF"
WARN[0158] Request Failed                                error="Post \"http://a6de82261ec9c43419594071bf44e630-985dadf1a6ea2203.elb.us-west-2.amazonaws.com/anthing\": write tcp 192.168.1.200:63754->18.236.36.131:80: use of closed network connection"
WARN[0158] Request Failed                                error="Post \"http://a6de82261ec9c43419594071bf44e630-985dadf1a6ea2203.elb.us-west-2.amazonaws.com/anthing\": write tcp 192.168.1.200:63736->34.216.125.93:80: use of closed network connection"

running (3m00.2s), 000/100 VUs, 20731 complete and 0 interrupted iterations
default ✓ [======================================] 000/100 VUs  3m0s

     data_received..............: 9.6 MB  53 kB/s
     data_sent..................: 3.9 GB  22 MB/s
     http_req_blocked...........: avg=143.49ms min=65.23ms  med=78.98ms  max=1.93s p(90)=274.08ms p(95)=537.25ms
     http_req_connecting........: avg=142.25ms min=65.19ms  med=78.89ms  max=1.91s p(90)=270.97ms p(95)=531.02ms
     http_req_duration..........: avg=103.94ms min=0s       med=50.28ms  max=3.95s p(90)=69.11ms  p(95)=364.4ms 
     http_req_failed............: 100.00% ✓ 20731      ✗ 0    
     http_req_receiving.........: avg=103.94ms min=0s       med=50.28ms  max=3.95s p(90)=69.11ms  p(95)=364.4ms 
     http_req_sending...........: avg=0s       min=0s       med=0s       max=0s    p(90)=0s       p(95)=0s      
     http_req_tls_handshaking...: avg=0s       min=0s       med=0s       max=0s    p(90)=0s       p(95)=0s      
     http_req_waiting...........: avg=0s       min=0s       med=0s       max=0s    p(90)=0s       p(95)=0s      
     http_reqs..................: 20731   115.026015/s
     iteration_duration.........: avg=451.26ms min=184.51ms med=269.86ms max=18.5s p(90)=876.32ms p(95)=1.45s   
     iterations.................: 20731   115.026015/s
     vus........................: 99      min=5        max=99 
     vus_max....................: 100     min=100      max=100


@iamNoah1 I was able to reproduce the issue with the following setup:

  • AWS EKS version: 1.19
  • Ingress-nginx: k8s.gcr.io/ingress-nginx/controller:v1.0.0
  • aws-load-balancer-connection-idle-timeout: 55sec
  • Nginx timeouts: client-body-timeout = 190, client-header-timeout = 190

Scenario is the same, I simulated client that slowly sends the request. ELB response is 504.

Thank for the feedback @VladyslavKurmaz. @evredinka friendly ping. If this is not a bug for you anymore, please consider closing this issue.

@iamNoah1 In our case, it just confirmed next bulletproof rule: in 99,9% cases the issue is on our side. In our AWS k8s cluster, we have two types of node pools and there was network configuration issue between them. That is why the same configuration Nginx ingress + our application worked fine using Digital Ocean and did not using AWS. After hotfix everything works fine.