spring-cloud-netflix: Zuul leaves some connections in CLOSE_WAIT state for further reuse, but some never get reused and stay in that state forever, blocking further requests

I have a Zuul server that proxies all my requests to autodiscovered (via Eureka) routes.

This works fine most of the time. However, I have noticed some very odd behaviour the occurs sporadically and can only partially be recreated.

After making multiple simultaneous requests, for example the swagger-ui.html page for a given API description, which loads not only the page itself but also numerous webjars and resources, some connections end up in a CLOSE_WAIT state.

tcp6       1      0 host:54470 host:37612 CLOSE_WAIT  user      425593599   4396/java
tcp6       1      0 host:57724 host:37612 CLOSE_WAIT  user      426384390   4396/java
tcp6       1      0 host:59402 host:52887 CLOSE_WAIT  user      425517966   4396/java
tcp6       1      0 host:59403 host:52887 CLOSE_WAIT  user      425489000   4396/java
tcp6       1      0 host:59404 host:52887 CLOSE_WAIT  user      425518687   4396/java
tcp6       1      0 host:59405 host:52887 CLOSE_WAIT  user      425469338   4396/java
tcp6       1      0 host:59406 host:52887 CLOSE_WAIT  user      425518688   4396/java
tcp6       1      0 host:59407 host:52887 CLOSE_WAIT  user      425476214   4396/java
tcp6       1      0 host:60118 host:37612 CLOSE_WAIT  user      426773630   4396/java
tcp6       1      0 host:60154 host:37612 CLOSE_WAIT  user      426810662   4396/java
tcp6       1      0 host:60155 host:37612 CLOSE_WAIT  user      426824573   4396/java
tcp6       1      0 host:60156 host:37612 CLOSE_WAIT  user      426821100   4396/java
tcp6       1      0 host:60157 host:37612 CLOSE_WAIT  user      426825547   4396/java
tcp6       1      0 host:60158 host:37612 CLOSE_WAIT  user      426820353   4396/java
tcp6       1      0 host:60159 host:37612 CLOSE_WAIT  user      426618721   4396/java
tcp6       1      0 host:60160 host:37612 CLOSE_WAIT  user      426802727   4396/java
tcp6       1      0 host:60161 host:37612 CLOSE_WAIT  user      426825548   4396/java
tcp6       1      0 host:60162 host:37612 CLOSE_WAIT  user      426824574   4396/java
tcp6       1      0 host:60163 host:37612 CLOSE_WAIT  user      426618722   4396/java
tcp6       1      0 host:60167 host:37612 CLOSE_WAIT  user      426689993   4396/java
tcp6       1      0 host:60168 host:37612 CLOSE_WAIT  user      426618745   4396/java
tcp6       1      0 host:60169 host:37612 CLOSE_WAIT  user      426796620   4396/java
tcp6       1      0 host:60170 host:37612 CLOSE_WAIT  user      426824617   4396/java
tcp6       1      0 host:60171 host:37612 CLOSE_WAIT  user      426827273   4396/java

The 4396 process in this case is my IDE with which I was debugging the Zuul server. When I perform another refresh of the same browser site, many of the connections are successfully closed, though some more pop up after a while. The behaviour also happens, although less frequently, when making numerous cURL requests to any given route.

I dug around in the SimpleHostRoutingFilter which uses a PoolingHttpClientConnectionManager and noticed something peculiar:

  • The TTL of the default configuration is set to -1, i.e. infinite
  • The connections that are in CLOSE_WAIT get reused for establishing new connections in line 318 of PoolingHttpClientConnectionManager (something which I find extremely odd, but I am unsure if this might be a standard Java approach)

However, there are some connections that live on eternally in a CLOSE_WAIT state that I cannot get rid of. The other end of the route does not have any open connections still lying around - it is merely the Zuul which is not successfully closing the connections in the CLOSE_WAIT state.

Eventually these connections clog up the pool and I stop getting responses from my services altogether, and I have not seen Zuul clean them up even after >1 day.

What is odd as well is that, the cap seems to be 50 connections although the maxPerRoute parameter is set to 20.

Is this a known issue? Is there a workaround known? I was planning to subclass/replace the SimpleHostRoutingFilter with my own and pass it a connection pool manager configuration with some TTL value to see if there could be any improvements, but I thought I should first ask if this is a known issue seeing how the effort required is non-trivial.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 23 (14 by maintainers)

Commits related to this issue

Most upvoted comments

we’ll talk about it tomorrow morning.