caddy: Caddy hangs when php-fpm restarts
Caddy v2.4.6 and likely most/all earlier versions of v2.
I’m using php-fpm as the back-end for processing web requests. php-fpm may occasionally restart the worker processes. IF caddy is under a high load, THEN it won’t talk to the new worker processes until after the load stops.
My Caddyfile is pretty straightforward.
{
##debug
https_port 10443
admin off
# TLS options
# self-signed requires: apt install libnss3-tools
local_certs
auto_https disable_redirects
ocsp_stapling off
default_sni localhost
# Available 2021-06-07: https://github.com/caddyserver/caddy/pull/4153
skip_install_trust
order cgi last
}
:10443
{
## Uncomment for assigned certs
tls ../users/certs/cert.pem ../users/certs/cert.key
## Specify the web directory
root ../www
## Use PHP (must be rw by caddy user)
php_fastcgi unix//var/spool/ff/php7.4-fpm.sock
## Enable security interface
@sec not path /server/*
handle @sec {
rewrite * /log.php
}
file_server
}
I run Caddy using: ./caddy run
The stress test (stresser.sh) just spawns 20 GET requests at a time.
#!/bin/bash
count=0
while [ 1 ] ; do
# Upload with analysis
curl -k -A 'stresser' \
'https://localhost:10443/hello.php' > /dev/null 2>&1 &
((count=$count+1))
if [ $count -gt 20 ] ; then
echo "WAIT! $(date)"
wait
count=0
fi
done
The hello.php is nothing more than a hello-world reply.
PHP-FPM’s www.conf is configured for 8 static PHP workers. However, the workers can terminate if they take too long (if they are hung).
[www]
listen = /var/spool/ff/php7.4-fpm.sock
listen.owner = www-data
listen.group = www-data
pm = static
pm.max_children = 8
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3
request_terminate_timeout = 60
catch_workers_output = yes
While stresser.sh is running, simulate a PHP worker failure:
killall php-fpm7.4
rm -f /var/spool/ff/php7.4-fpm.pid
/usr/sbin/php-fpm7.4
What happens: Caddy just hangs. It stops calling php.
Stop stresser.sh and wait 2-3 seconds. Then restart stresser.sh. What happens: Caddy works fine.
Caddy appears to not pick up new php worker connections when it’s handling incoming HTTP traffic. If the traffic stops – just for a second or two – then everything resets and it works fine.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21 (12 by maintainers)
(Thanks for the thorough debugging/troubleshooting! Very interesting. I have been watching this conversation but have been too busy to reply to it. Just wanted to let you know I’m keeping an eye on it. Carry on. 🙂 )
Retries do work with single upstreams.
I don’t think this was true, I’m pretty sure the dial timeout was 10s, but it wasn’t properly documented as such.
But it’s true that read timeouts are not enabled by default.
I haven’t either. I would hope that php-fpm would close the connection when it resets. But I dunno how it behaves.
Btw, we’re aware our fastcgi code isn’t the best. We’ve had https://github.com/caddyserver/caddy/issues/3803 open for a while, wanting to do a refactor/rewrite. But it’s really tricky to get right, we’re not experts on this protocol. There’s very few actual fastcgi client implementations in Go that we can use as inspiration.