kubernetes-ingress-controller: Nil targets from balancer and service name resolution failed in db-less mode when use named port
Summary
After upgrade from 1.4.2 to 2.0.2, our service went down after a while because of resolution failed: dns server error: 3 name error.
When I trace down, I found several issue related to this problem:
get_balancer
fail in our prod env after upgradation- it fall back to dns resolution and would never recover
- resolution fail because it use wrong service name when use named port in ingress
e.g.
kong would treat named port as part of service name when synced by ingress-controller and Ooops…
it generate the same target service name on 1.4.2 but seems it always able to get target from balancer, so the problem never trigger before.
Steps To Reproduce
Can’t reproduce get balancer failure unless put in our prod env (tried bench tools…) but for the resolution problem:
- create a ingress with named servicePort (e.g.
http
instead of 80) - you got the problematic service name
Additional Details & Logs
You can check the details of dns failure I posted in Kong/kong#5455
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 25 (14 by maintainers)
@hbagdi what minimum versions of kong and kong-ingress-controller fix this problem? We moved to 2.1.4 / 0.9.0 due to https://github.com/Kong/kong/pull/5831 talking about fixing this in
2.1.0
and @hishamhm also mentioning2.1.0-alpha1
above, but we still get into this exceptional case. Note: we are running kong in db-mode.Separate from all of the related issues to this mentioning why
balancer
can possibly be null sometimes, I’m confused why the fallback isn’t able to resolve to a valid name when kong is running in k8s? Here’s an example of the 3 name error that we get.Two examples that would have worked are below, but the fallback never tries these since the target ends up with the port present. I’m having trouble finding where in the ingress controller is adding that port.
@carnei-ro That PR is not in because it caused other errors in the
next
branch, and the team later agreed it wasn’t an ideal solution formaster
either.The balancer code in
next
(and by extension 2.1.0-alpha1) contains many changes, plus we pushed fixes to the DB-less configuration loading logic, which together may cause the issue to not happen in the 2.1 branch.We haven’t had confirmation from any users that the issue persists in 2.1.0-alpha1 (so it may be fixed already!), but we continue to investigate this, and if you have any info, please let us know!