rancher: k3s with aws network load balancer: aws health check fails due to traefik ingress controller

k3s rancher 2.4.3.

Steps to reproduce (least amount of steps as possible): Followed all the steps in the install instructions to install the k3s version of rancher on 2 x ubuntu 18.04 aws ec2 instances, and followed your instructions to setup a aws network load balancer. All installs and appears setup fine. I am building this out as a terraform automated deploy and its worked out pretty good so far.

The issue I have is the aws network load balancer health checks never become healthy, and the reason is that traefik is not getting the hostname it expects to route it through. Since they never become healthy they don’t route traffic!

I can confirm traefik is working fine on one of the ubuntu vm’s:

# curl http://127.0.0.1/healthz
404 page not found

# curl  -H Host:rancher.mydomain.com http://127.0.0.1
<a href="https://rancher.mydomain.com/">Found</a>.

# curl  -H Host:rancher.mydomain.com http://127.0.0.1/healthz
ok

Thus I would say the rancher k3s install looks setup correctly.

There does not seem to be any way in the aws target groups to insert a header for health checks (nlb and alb). So it looks like traefik and aws load balancers is unworkable?

Thus I am wondering if I can dispense with traefik and use something else (nginx?) as a ingress controller? Or maybe I am missing something really obvious to resolve this? Any help would be appreciated.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (2 by maintainers)

Most upvoted comments

I managed to solve my issues. Documented it in the repo I have put up on github that deploys k3s as a dual node cluster in aws using terraform (there is also a single node deploy). Link to the issue/fix.

Summary: The rancher instructions for creating the Network Load Balancer are incorrect for k3s:

  • The health check settings are misleading and not required. We are implementing a TCP load balancer (connect to a port) and its not concerned with any protocols such as HTTP/HTTPS. Furthermore this health check causes the targets to be shown as unhealthy, because of the use of the treafik ingress controller, and will mislead that things are not working correctly.
  • The target group for HTTPS (443) should be directed to port 443 on the ec2 nodes and not port 80.

Rancher instructions for creating the AWS Network Load Balancer.

Feel free to try out my terraform!