caddy: lb_try_interval/lb_try_duration do not pick up new backends on config reload

Given the following Caddyfile:

{
    auto_https off
}
:80 {
    reverse_proxy localhost:8003 {
        lb_try_duration 30s
        lb_try_interval 1s
    }
}

I use caddy run and run a separate server on port 8003 (I’m using datasette -p 8003 here) and it proxies correctly. If I shut down my 8003 server and try to hit http://localhost/ I get the desired behaviour - my browser spins for up to 30s, and if I restart my 8003 server during that time the request is proxied through and returned from the backend.

What I’d really like to be able to do though is to start up a new server on another port (actually in production on another IP/port combination) and have traffic resume against the new server.

So I tried editing the Caddyfile to use localhost:8004 instead, started up my backend on port 8004 then used caddy reload to load in the new configuration… and my request to port 80 continued to spin. It appears Caddy didn’t notice that there was now a new configuration for the backend for this reverse_proxy.

It would be really cool if that lb_try_interval/lb_try_duration feature could respond to updated configurations and seamlessly forward paused traffic to the new backend.

(This started as a Twitter conversation: https://twitter.com/mholt6/status/1463656086360051714)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 7
  • Comments: 18 (7 by maintainers)

Commits related to this issue

Most upvoted comments

@simonw does the Dynamic Upstreams feature solve your usecase? https://caddyserver.com/docs/caddyfile/directives/reverse_proxy#dynamic-upstreams

Also worth noting, we just merged https://github.com/caddyserver/caddy/pull/4756 which adds lb_retries, i.e. an amount of retries to perform. Probably not necessarily useful for you here, but I wanted to mention it because this issue is related to retries.

I think we can probably close this issue now.

I’ve implemented the getting of upstreams “per retry” in #4470. The actual API endpoint to adjust the upstreams specifically will have to come in a future PR.

I wonder if an API endpoint that just adds/removes backends without a config reload could be helpful.

The other piece of this would be we’d have to get the list of upstreams in each iteration of the for loop instead of just once before the for loop. Or maybe we’d only have to do that if the list of upstreams changed since that iteration started. Hmmm.

If you feel like watching Matt talk about it for 50 minutes 😅 he streamed the beginning of the work on refactoring the upstreams logic https://youtu.be/hj7yzXb11jU

tl;dw, right now the upstreams are a static list, but the plan is to make it possible to have a dynamic list of upstreams, and the source could be whatever (you could write a custom module to provide the list on the fly, via SRV, or maybe fetch from HTTP and cache it for a few seconds, I dunno, whatever you like).

Point of note @mholt for this to work though, it would need to fetch a list of upstreams on every retry iteration and not just once before the loop.

Oh I’ve just had the most horrible idea… could something like this work?

The idea being that I can change the definition of pod1-actual.example.com and refresh the configuration and get the behaviour I’m looking for!

Hmm, I don’t think so, because Caddy still needs to load a new config, and the new one won’t take effect while there’s still pending requests. Any config changes change the entire server’s config, it’s not targetted.

The issue is still valid though. You’d want to proxy directly to pods in many scenarios, so you can take advantage of lb_policy and others.

Oh I see what you mean! Yes that’s a fantastic idea, I shall try that.

You can still do the same if you use a service on top of your pods. The only difference is that K8s would be the one resolving the IP & port for you. Caddy can simply proxy to the service address and not knowing which pod.

When you bring down your only pod, the service is effectively down, and caddy can do its normal retry until a new pod is online, i.e. the service is back up.