caddy: Admin Endpoint constantly crashing

Hoping someone can shed some light on this. We have Caddy deployed in a reverse proxy configuration with the admin endpoint exposed to a localhost. We have a script which hits the admin endpoint every 10 minutes to update a list of permitted IP addresses that it retrieves from an external API.

We are observing that every few hours the admin endpoint in Caddy will stop responding, any attempted to curl localhost:2019 will result in timeouts and log files indicate Caddy is experiencing an ‘io timeout,’ When the issue occurs we observe the following in our log:

Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.1752336,"logger":"admin.api","msg":"received request","method":"PATCH","host":"127.0.0.1:2019","uri":"/config/apps/http/servers/srv0/routes/0/handle/0/routes/0/match/0/remote_ip/ranges/","remote_ip":"127.0.0.1","remote_port":"48012","headers":{"Accept":["*/*"],"Connection":["close"],"Content-Length":["9300"],"Content-Type":["application/json"],"User-Agent":["curl/7.68.0"]}}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.1816225,"logger":"admin","msg":"admin endpoint started","address":"tcp/127.0.0.1:2019","enforce_origin":false,"origins":["//127.0.0.1:2019","//localhost:2019","//[::1]:2019"]}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.1836116,"logger":"http","msg":"skipping automatic certificate management because one or more matching certificates are already loaded","domain":"pki.int.windscribe.com","server_name":"srv0"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.184285,"logger":"http","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"srv0"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.1869717,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc000153ab0"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.1875331,"logger":"admin.api","msg":"received request","method":"PATCH","host":"127.0.0.1:2019","uri":"/config/apps/http/servers/srv0/routes/0/handle/0/routes/1/match/0/remote_ip/ranges/","remote_ip":"127.0.0.1","remote_port":"48014","headers":{"Accept":["*/*"],"Connection":["close"],"Content-Length":["180"],"Content-Type":["application/json"],"User-Agent":["curl/7.68.0"]}}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.1887658,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 5ms"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.194842,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 10ms"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.2056587,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 20ms"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.2267587,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 40ms"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.267665,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 80ms"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.348986,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 160ms"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.5105457,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 320ms"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.8328366,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 640ms"}
Aug  8 17:50:02 pki-do caddy[174943]: {"level":"info","ts":1659995402.249503,"logger":"tls.cache.maintenance","msg":"stopped background certificate maintenance","cache":"0xc0000eed20"}
Aug  8 17:50:02 pki-do caddy[174943]: {"level":"info","ts":1659995402.251662,"msg":"config is unchanged"}
Aug  8 17:50:02 pki-do caddy[174943]: {"level":"info","ts":1659995402.27559,"logger":"admin","msg":"stopped previous server","address":"tcp/127.0.0.1:2019"}
Aug  8 17:50:02 pki-do caddy[174943]: {"level":"info","ts":1659995402.4745073,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 1s"}
Aug  8 17:50:03 pki-do caddy[174943]: {"level":"info","ts":1659995403.4750612,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 1s"}
Aug  8 17:50:04 pki-do caddy[174943]: {"level":"info","ts":1659995404.4761431,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 1s"}

At the moment our solution is to simply restart caddy and I’ve implemented a cron job to do this every hour, but it seems like a horribly hacky solution. Anyone experienced this before or have any ideas?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

I checked again and looks like it was a mistake on my alarms. I apologise - it is actually still working!

Thanks for the update! Will advise if we have any issues in the meantime.

@mholt I haven’t seen the issue re-occur in the past week (previously we would be unable to go more then 48 hours without the issue), so this is looking quite promising!

Any idea when the changes will be merged into Master?

Hello,

Thanks for the response, here’s the template filled out to the best of my ability

1. Environment

1a. Operating system and version

Ubuntu 20.04.4 LTS \n \l

1b. Caddy version (run caddy version or paste commit SHA)

v2.5.2 h1:eCJdLyEyAGzuQTa5Mh3gETnYWDClo1LjtQm2q9RNZrs=

1c. Go version (if building Caddy from source; run go version)

N/A Installed from apt

2. Description

2a. What happens (briefly explain what is wrong)

Admin endpoint stops responding after an indeterminate amount of time. Admin endpoint is being hit by a script every 10 minutes which populates an allow-list of IP addresses

2b. Why it’s a bug (if it’s not obvious)

2c. Log output

Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.1752336,"logger":"admin.api","msg":"received request","method":"PATCH","host":"127.0.0.1:2019","uri":"/config/apps/http/servers/srv0/routes/0/handle/0/routes/0/match/0/remote_ip/ranges/","remote_ip":"127.0.0.1","remote_port":"48012","headers":{"Accept":["*/*"],"Connection":["close"],"Content-Length":["9300"],"Content-Type":["application/json"],"User-Agent":["curl/7.68.0"]}}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.1816225,"logger":"admin","msg":"admin endpoint started","address":"tcp/127.0.0.1:2019","enforce_origin":false,"origins":["//127.0.0.1:2019","//localhost:2019","//[::1]:2019"]}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.1836116,"logger":"http","msg":"skipping automatic certificate management because one or more matching certificates are already loaded","domain":"pki.int.windscribe.com","server_name":"srv0"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.184285,"logger":"http","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"srv0"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.1869717,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc000153ab0"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.1875331,"logger":"admin.api","msg":"received request","method":"PATCH","host":"127.0.0.1:2019","uri":"/config/apps/http/servers/srv0/routes/0/handle/0/routes/1/match/0/remote_ip/ranges/","remote_ip":"127.0.0.1","remote_port":"48014","headers":{"Accept":["*/*"],"Connection":["close"],"Content-Length":["180"],"Content-Type":["application/json"],"User-Agent":["curl/7.68.0"]}}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.1887658,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 5ms"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.194842,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 10ms"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.2056587,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 20ms"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.2267587,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 40ms"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.267665,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 80ms"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.348986,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 160ms"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.5105457,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 320ms"}
Aug  8 17:50:01 pki-do caddy[174943]: {"level":"info","ts":1659995401.8328366,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 640ms"}
Aug  8 17:50:02 pki-do caddy[174943]: {"level":"info","ts":1659995402.249503,"logger":"tls.cache.maintenance","msg":"stopped background certificate maintenance","cache":"0xc0000eed20"}
Aug  8 17:50:02 pki-do caddy[174943]: {"level":"info","ts":1659995402.251662,"msg":"config is unchanged"}
Aug  8 17:50:02 pki-do caddy[174943]: {"level":"info","ts":1659995402.27559,"logger":"admin","msg":"stopped previous server","address":"tcp/127.0.0.1:2019"}
Aug  8 17:50:02 pki-do caddy[174943]: {"level":"info","ts":1659995402.4745073,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 1s"}
Aug  8 17:50:03 pki-do caddy[174943]: {"level":"info","ts":1659995403.4750612,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 1s"}
Aug  8 17:50:04 pki-do caddy[174943]: {"level":"info","ts":1659995404.4761431,"msg":"http: Accept error: accept tcp 127.0.0.1:2019: i/o timeout; retrying in 1s"}

2d. Workaround(s)

Restart Caddy

2e. Relevant links

3. Tutorial (minimal steps to reproduce the bug)

  1. Install Caddy via apt
  2. Use the following Caddyfile (/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile)
{
  order error after reverse_proxy
}

pki.int.windscribe.com:8200 {

  tls /etc/ssl/pki.int.windscribe.com/fullchain.pem /etc/ssl/pki.int.windscribe.com/privkey.pem
  @nodes_prd {
    path "/v1/Windscribe Core CA X2/issue/node_server_selfmanaged_prd"
    remote_ip 149.56.10.85
  }

  @nodes_stg {
    path "/v1/Windscribe Core CA X2/issue/node_server_selfmanaged_stg"
    remote_ip 149.56.10.85
  }

  reverse_proxy @nodes_prd https://pki.int.windscribe.com:8200
  reverse_proxy @nodes_stg https://pki.int.windscribe.com:8200
  reverse_proxy /v1/auth/approle/login https://pki.int.windscribe.com:8200


  error 403 {
    message "Access is Denied"
  }

}
  1. Run the following script every 10 minutes (eventually the admin endpoint will hang)
curl --fail -X PATCH -d ["192.168.0.1","192.168.0.2","192.168.0.3"] -H "Content-Type: application/json" http://127.0.0.1:2019/config/apps/http/servers/srv0/routes/0/handle/0/routes/1/match/0/remote_ip/ranges/

Should be resolved now that #4705 is merged

Whew. Don’t give me a heart attack like that! 😆 Thanks for confirming quickly!

An update - it was working well so far, however, it just crashed a few minutes ago in my case despite having the patch 😦

I compiled from commit 4c282e86dae16359370f410e29dc96dc5aeeb449 + manually merging the branch in, using Go v1.19 on Linux (Fedora 36).

@JeDaYoshi That’s great news, thanks for the information! Will likely merge this very soon.

Can also comment that I’ve noticed the admin endpoint crash issue after 6 hours-2 days of uptime.

However, since applying the reuseport branch’s patches, it has not happened so far.

Sure.

I’ve gone ahead and compiled a binary from that branch and have it running in our environment. Will advise how it looks after a couple of days, should know pretty quickly if there is still an issue.