caddy: Slow config reload when using shutdown_delay

We’re currently using a shutdown_delay of 15 seconds to give our hardware load balancer warning of when caddy is about to go down for drains/maintenance etc as per https://caddyserver.com/docs/caddyfile/options#shutdown-delay - but this also seems to delay the config reload by the same amount. During the shutdown period our load balancer marks the caddy servers offline (as it should) so we lose ingress to the swarm services until they come back online and pass the healthcheck again.

Should/can a reload ignore this delay?

image

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 16 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks to @IndeedNotJames for finding a way to reproduce it, which spun the wheels in my head and I realized it probably had to do with the recent listener rewrite which caused unix systems to not use the listener pool at all anymore, so that’s no longer a reliable way to check for a shutdown.

I have a fix in https://github.com/caddyserver/caddy/pull/5405 which just uses a simple atomic ref counter.

Managed to reproduce the issue with vanilla Caddy. It’s specific to unix and /listen_unix.go

Minimal repro Caddyfile:

{
	debug
	shutdown_delay 5s
}

localhost:80 {
	respond 1
}

The either changing the respond string or caddy reload --force will engage the shutdown_delay – for a config reload.

As far as I can tell, this has been the case since its initial implementation and release in https://github.com/caddyserver/caddy/releases/tag/v2.6.0

All I essentially did was noticing https://github.com/caddyserver/caddy/blob/f6bab8ba85b231ea0930282e684c0040001059e6/usagepool.go#L199-L210 always jumped straight to return 0, false. I tried understanding as to why, stumbling in the dark, being completely unaware that /listen_unix.go existed 👀

@francislavoie took it from there 😃