frankenphp: Memory leak
Frankenphp seems to be leaking memory (at least in my specific configuration). I first noticed it when benchmarking it. It would staircase to heaven with each request passing through it, eventually getting killed by OOM killer. Just now I noticed that its memory usage keeps growing while idle as well (well, almost idle; it still handles healthchecks). Below you’ll find a screenshot showing an idle frankenphp pod growing both in memory usage and CPU usage while receiving 1 request every 5 seconds from an ELB. Whatever is leaking is also adding on the CPU workload.
Screenshot: https://imgur.com/a/a6wbI2A
Setup:
- custom docker image built from the official one
- arm64
- worker mode (I think only worker mode exhibits this behavior, but I haven’t really checked)
FrankenPHP v1.0.0 PHP 8.3.0 Caddy v2.7.5 h1:HoysvZkLcN2xJExEepaFHK92Qgs7xAiCFydN5x5Hs6Q=- Symfony app (api platform) w/ frankenphp-runtime (a fork, actually)
[PHP Modules]
amqp
apcu
bcmath
calendar
Core
ctype
curl
date
dom
exif
FFI
fileinfo
filter
ftp
gd
gettext
hash
iconv
imagick
intl
json
ldap
libxml
mbstring
mysqlnd
openssl
pcntl
pcre
PDO
pdo_mysql
pdo_sqlite
Phar
posix
random
readline
redis
Reflection
session
SimpleXML
soap
sockets
sodium
SPL
sqlite3
standard
tokenizer
xml
xmlreader
xmlwriter
xsl
Zend OPcache
zlib
[Zend Modules]
Zend OPcache
Let me know how can I help to get to the bottom of this. I’m not really familiar with the development tools involved in golang/caddy/frankenphp, but I’m a quick learner.
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Reactions: 1
- Comments: 22 (5 by maintainers)
I’m happy to see such a lively debate and, before I go any further, I’d like to thank you all for your input and @withinboredom for the nice script. I’ve only read it, didn’t use it yet.
I was going to respond later today, after I got the chance to gather more data but the offtopic branch of this thread does interest me. I’m going to defend the “a hello world app should be able to run on a 512 MiB rPI.” side of the conversation. If I need more than 512 MiB of RAM, it must be because of the app is a huge beast and/or gets a lot of traffic, not because the HTTP parser needs 1 GB of RAM to handle some TCP streams.
What I expect from caddy is to consume a small-ish amount of RAM per connection and/or request. We can’t pretend webservers need gigs of RAM to work when a few days ago we were happily running nginx with thousands of rps on an rPI. Maybe caddy has more features. We can explain some more memory usage that way, but we can’t explain gigs worth of RAM… Caddy is written in Go, it’s garbage collected. We all know how GC impacts memory usage, but we also know we can tune that and have it work fairly predictably. GC means more memory usage, but it doesn’t account for that much.
What I expect from the frankenphp side of things is:
The last 2 points, maybe 3, are my concern. If my app is poorly written, it’s going to leak memory in the worker. If the framework I’m using is not optimised for long running processes, it may leak memory over time. If I’m using some poorly written PHP extension, again, it may leak memory. Those things are my responsibility and I can control them. I expect whatever memory leaks happen, to be from my codebase, not the harness (frankenphp, or caddy).
Assuming every part of this system is leak proof, I should be able to run my new API platform app in a predictable manner with even 512 MiB of RAM, right? It may not be able to process more than 10-20 rps, but it should be able to run without getting killed by the kernel.
I’m running my app in k8s. I’d rather have HPA spawn more pods to handle the demand instead of having a few large pods happily idling with a few gigs of reserved RAM. My staging environment reserves about 2 GB of RAM for mercure+redis+rabbit+mysql. I’m not willing to reserve another 2 GB on just the API platform pod.
A fresh API platform can definitely run on an rPI using nginx and php-fpm. Why should it need more running under caddy?
That being said, let’s figure out where’s the memory leak (if any) and let this topic be a starting point for a future “frankenphp production tuning guide” 😃
So, there are indeed some small memory leaks, but I’m still trying to narrow them down (like 100kb-ish per request using
echo "Hello world"as a test script and the executor script for worker mode). For some good news, generally restarting a worker every ~200 or so requests seems to help:It’s possible to get it to fit into 2Gi, but it is quite a challenge because Go is designed for something other than this use case. Once the available memory is used up, the GC thrashes, dropping max requests to 2% of the maximum.
So, why is nginx “so much more performant” in low-memory situations? It’s likely because Nginx has complete control over memory. It is usually more efficient to reuse memory than to free and re-acquire the same memory repeatedly (which is what Go is doing and makes sense for general-purpose applications). However, memory is pretty cheap (unless you are in the cloud), so we can forego doing anything with it and sacrifice memory for speed.
Here’s a docker command to forcibly fit Frankenphp in a 2Gi container at an (extreme) expense of performance:
Also, the following kernel parameters are set:
Actually. Yes.
For (2), it’s a red herring because I wrote up something that panics if something isn’t freed by the end of the request. It makes the code rather messy because each pointer has to be “registered” with it, but I’d be happy to push it if it might be helpful. But anyway, it was all freed.
I managed to find out where all those gigabytes of memory are; they are in Caddy request contexts. So, I suspect we may be reusing request contexts from other requests, but I’m still chasing that down. I’ll make sure to report back when I find something interesting.
Merry Christmas! I’m back with more news!
During the investigation into the elusive memory leak in FrankenPHP, I discovered two underlying issues (the Create Timer crash in #440 and a fun one in #439), yet the mystery wasn’t entirely done. I’ve been slamming FrankenPHP with load tests to discover all the different ways it can break, mostly in an attempt to discover the conditions in which it becomes the OOM Killer’s best friend.
It appears (for lack of other evidence) that there is potentially a memory leak in Go’s http2 implementation (or Caddy isn’t handling something correctly) that is causing http2 frames to hang around way too long. This heap profile was taken after a few GCs and had previously handled ~100k requests. It was idle at the time.
With only 2GB of memory available, using http1 (
SERVER_NAME=":80"), worker mode, disabling the docker healthchecks, and applying the GC settings above, ~can let you handle over half a million requests without an issue (with an elementary PHP file).~ and #442, you can handle nearly infinite requests without an issue.FrankenPHP and Caddy are using the rest of the memory via request contexts, as each time a request context is modified, an entirely new copy is created.
I’m no Go genius, nor am I entirely familiar with how to hunt down memory leaks in Go, and what I do know are things I’ve learned over the past week. So, potentially, these graphs just show us something that maybe, someday, hopefully, will be GC’d, yet as far as I can tell, are never actually GC’d. However, I’m reasonably sure there are one of two things going on:
I’m leaning more towards (2), and this is why:
Go does some shenanigans via syscalls. Thus, I greatly suspect that all requests received by the FrankenPHP handler cannot be freed until the worker fully dies. I cannot find any documentation on that though…
While writing this, I had a eureka moment and opened a PR with a straightforward fix because we don’t need caddy’s context. I’ve opened PR #442 with the changes. It doesn’t remove the memory leak entirely but gets rid of a large chunk of it.
heaptrack.frankenphp.24362.gz
You can use heaptrack_gui (available on linux, probably Mac, too) to view this. This shows where the memory is leaking in cgo/go (it looks like there may be a minor leak in
worker.gosince a “dummy request” is created each time a worker restarts).Gist of leaks detected:
frankenphp_update_server_context: specifically in regards to ts_resource (leaks somewhere inzend_ini_refresh_caches, andzend_hash_copy– so it might be a PHP leak).Unfortunately, it doesn’t appear to capture all the proper threads, so it’s missing request handling. I’ll see if I can work that out if someone doesn’t beat me to it. Hmmm. I managed to get some of the request threads, but they are too big to upload here. I’ll take some smaller ones tomorrow or later this week.
I created a tiny load test using K6 to try to reproduce the issue: https://github.com/dunglas/frankenphp/pull/392
So far, with a very minimal Caddyfile (included in the PR), I didn’t manage to reproduce the issue (on Mac). Maybe it will appear by adding extra options like compression.
Would you mind trying this load test on your machine/config to see if you’re able to reproduce the problem?
I had to change the script a bit but after running it looks fairly stable.
Running it with 100 requests yields
I had it run 10k loops and then i
watchdps -o pid,user,rss,vsz,commyieldingSo, the actual memory usage while working is 61 MiB. I then tried to measure the memory usage while
sleeping, before doing any work – the baseline memory usage – and found29m. So that’s PHP’s minimal memory usage. The actual app is thus using 32 MiB.