frankenphp: Memory leak

Frankenphp seems to be leaking memory (at least in my specific configuration). I first noticed it when benchmarking it. It would staircase to heaven with each request passing through it, eventually getting killed by OOM killer. Just now I noticed that its memory usage keeps growing while idle as well (well, almost idle; it still handles healthchecks). Below you’ll find a screenshot showing an idle frankenphp pod growing both in memory usage and CPU usage while receiving 1 request every 5 seconds from an ELB. Whatever is leaking is also adding on the CPU workload.

Screenshot: https://imgur.com/a/a6wbI2A

Setup:

  • custom docker image built from the official one
  • arm64
  • worker mode (I think only worker mode exhibits this behavior, but I haven’t really checked)
  • FrankenPHP v1.0.0 PHP 8.3.0 Caddy v2.7.5 h1:HoysvZkLcN2xJExEepaFHK92Qgs7xAiCFydN5x5Hs6Q=
  • Symfony app (api platform) w/ frankenphp-runtime (a fork, actually)
[PHP Modules]
amqp
apcu
bcmath
calendar
Core
ctype
curl
date
dom
exif
FFI
fileinfo
filter
ftp
gd
gettext
hash
iconv
imagick
intl
json
ldap
libxml
mbstring
mysqlnd
openssl
pcntl
pcre
PDO
pdo_mysql
pdo_sqlite
Phar
posix
random
readline
redis
Reflection
session
SimpleXML
soap
sockets
sodium
SPL
sqlite3
standard
tokenizer
xml
xmlreader
xmlwriter
xsl
Zend OPcache
zlib

[Zend Modules]
Zend OPcache

Let me know how can I help to get to the bottom of this. I’m not really familiar with the development tools involved in golang/caddy/frankenphp, but I’m a quick learner.

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Reactions: 1
  • Comments: 22 (5 by maintainers)

Most upvoted comments

I’m happy to see such a lively debate and, before I go any further, I’d like to thank you all for your input and @withinboredom for the nice script. I’ve only read it, didn’t use it yet.

I was going to respond later today, after I got the chance to gather more data but the offtopic branch of this thread does interest me. I’m going to defend the “a hello world app should be able to run on a 512 MiB rPI.” side of the conversation. If I need more than 512 MiB of RAM, it must be because of the app is a huge beast and/or gets a lot of traffic, not because the HTTP parser needs 1 GB of RAM to handle some TCP streams.

What I expect from caddy is to consume a small-ish amount of RAM per connection and/or request. We can’t pretend webservers need gigs of RAM to work when a few days ago we were happily running nginx with thousands of rps on an rPI. Maybe caddy has more features. We can explain some more memory usage that way, but we can’t explain gigs worth of RAM… Caddy is written in Go, it’s garbage collected. We all know how GC impacts memory usage, but we also know we can tune that and have it work fairly predictably. GC means more memory usage, but it doesn’t account for that much.

What I expect from the frankenphp side of things is:

  • some (very little) overhead from its own processing (unless otherwise justified)
  • for each worker, a sizable chunk of reserved RAM (I actually don’t know exactly how much RAM an empty php VM uses; I should look it up)
  • for each worker, a chunk of RAM used by my booted app
  • for each worker, another chunk of RAM used by each request

The last 2 points, maybe 3, are my concern. If my app is poorly written, it’s going to leak memory in the worker. If the framework I’m using is not optimised for long running processes, it may leak memory over time. If I’m using some poorly written PHP extension, again, it may leak memory. Those things are my responsibility and I can control them. I expect whatever memory leaks happen, to be from my codebase, not the harness (frankenphp, or caddy).

Assuming every part of this system is leak proof, I should be able to run my new API platform app in a predictable manner with even 512 MiB of RAM, right? It may not be able to process more than 10-20 rps, but it should be able to run without getting killed by the kernel.

I’m running my app in k8s. I’d rather have HPA spawn more pods to handle the demand instead of having a few large pods happily idling with a few gigs of reserved RAM. My staging environment reserves about 2 GB of RAM for mercure+redis+rabbit+mysql. I’m not willing to reserve another 2 GB on just the API platform pod.

A fresh API platform can definitely run on an rPI using nginx and php-fpm. Why should it need more running under caddy?

That being said, let’s figure out where’s the memory leak (if any) and let this topic be a starting point for a future “frankenphp production tuning guide” 😃

So, there are indeed some small memory leaks, but I’m still trying to narrow them down (like 100kb-ish per request using echo "Hello world" as a test script and the executor script for worker mode). For some good news, generally restarting a worker every ~200 or so requests seems to help:

// random jitter to prevent all workers from restarting at the same time during load tests
$jitter = random_int(190, 220); 
for($counter = 0; $counter < $jitter; $counter++, frankenphp_handle_request($fn)) {
}
exit 0;

It’s possible to get it to fit into 2Gi, but it is quite a challenge because Go is designed for something other than this use case. Once the available memory is used up, the GC thrashes, dropping max requests to 2% of the maximum.

So, why is nginx “so much more performant” in low-memory situations? It’s likely because Nginx has complete control over memory. It is usually more efficient to reuse memory than to free and re-acquire the same memory repeatedly (which is what Go is doing and makes sense for general-purpose applications). However, memory is pretty cheap (unless you are in the cloud), so we can forego doing anything with it and sacrifice memory for speed.

Here’s a docker command to forcibly fit Frankenphp in a 2Gi container at an (extreme) expense of performance:

docker run \
  --rm \
  -e GOMEMLIMIT=1000MiB \ # set a soft 1Gi memory limit
  -e GOGC=1 \ # sacrifice all performance for more memory
  -e MADV_DONTNEED=1 \ # give any freed memory back to the OS instead of reusing it
  --memory 2G \ # memory limit of container
  -p 80:80 \
  -p 443:443 \
  -v $(pwd):/app 
  -v $(pwd)/caddy/frankenphp/Caddyfile:/etc/caddy/Caddyfile \
  --pull always \
  dunglas/frankenphp

Also, the following kernel parameters are set:

echo "defer" > /sys/kernel/mm/transparent_hugepage/defrag
echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none

Actually. Yes.

For (2), it’s a red herring because I wrote up something that panics if something isn’t freed by the end of the request. It makes the code rather messy because each pointer has to be “registered” with it, but I’d be happy to push it if it might be helpful. But anyway, it was all freed.

I managed to find out where all those gigabytes of memory are; they are in Caddy request contexts. So, I suspect we may be reusing request contexts from other requests, but I’m still chasing that down. I’ll make sure to report back when I find something interesting.

Merry Christmas! I’m back with more news!

During the investigation into the elusive memory leak in FrankenPHP, I discovered two underlying issues (the Create Timer crash in #440 and a fun one in #439), yet the mystery wasn’t entirely done. I’ve been slamming FrankenPHP with load tests to discover all the different ways it can break, mostly in an attempt to discover the conditions in which it becomes the OOM Killer’s best friend.

It appears (for lack of other evidence) that there is potentially a memory leak in Go’s http2 implementation (or Caddy isn’t handling something correctly) that is causing http2 frames to hang around way too long. This heap profile was taken after a few GCs and had previously handled ~100k requests. It was idle at the time.

With only 2GB of memory available, using http1 (SERVER_NAME=":80"), worker mode, disabling the docker healthchecks, and applying the GC settings above, ~can let you handle over half a million requests without an issue (with an elementary PHP file).~ and #442, you can handle nearly infinite requests without an issue.

FrankenPHP and Caddy are using the rest of the memory via request contexts, as each time a request context is modified, an entirely new copy is created.

I’m no Go genius, nor am I entirely familiar with how to hunt down memory leaks in Go, and what I do know are things I’ve learned over the past week. So, potentially, these graphs just show us something that maybe, someday, hopefully, will be GC’d, yet as far as I can tell, are never actually GC’d. However, I’m reasonably sure there are one of two things going on:

  1. Some objects hold onto http2 request frames/contexts that are no longer useful. These could be in logging objects, original request variables (I know caddy puts a reference to an original request in the context, which could hold a circular reference, but I only just thought of this and didn’t investigate it), etc.
  2. During cgi-mode requests, far less memory is utilized, and no memory leak seems apparent (though until #440 is fixed, we can’t run it long enough to tell for sure). Since worker mode is processing a request in a CGO callback, the memory being utilized (aka, the “request object”) may become “tainted,” and Go won’t ever GC that memory. Only when #440 is fixed will it be easier to tell.

I’m leaning more towards (2), and this is why:

Go does some shenanigans via syscalls. Thus, I greatly suspect that all requests received by the FrankenPHP handler cannot be freed until the worker fully dies. I cannot find any documentation on that though…

While writing this, I had a eureka moment and opened a PR with a straightforward fix because we don’t need caddy’s context. I’ve opened PR #442 with the changes. It doesn’t remove the memory leak entirely but gets rid of a large chunk of it.

heaptrack.frankenphp.24362.gz

You can use heaptrack_gui (available on linux, probably Mac, too) to view this. This shows where the memory is leaking in cgo/go (it looks like there may be a minor leak in worker.go since a “dummy request” is created each time a worker restarts).

Gist of leaks detected:

  1. mostly in frankenphp_update_server_context: specifically in regards to ts_resource (leaks somewhere in zend_ini_refresh_caches, and zend_hash_copy – so it might be a PHP leak).
  2. if I’m reading correctly, allocations in Go for C datastructures that are not freed. (such as the root directory to chdir to)

Unfortunately, it doesn’t appear to capture all the proper threads, so it’s missing request handling. I’ll see if I can work that out if someone doesn’t beat me to it. Hmmm. I managed to get some of the request threads, but they are too big to upload here. I’ll take some smaller ones tomorrow or later this week.

I created a tiny load test using K6 to try to reproduce the issue: https://github.com/dunglas/frankenphp/pull/392

So far, with a very minimal Caddyfile (included in the PR), I didn’t manage to reproduce the issue (on Mac). Maybe it will appear by adding extra options like compression.

Would you mind trying this load test on your machine/config to see if you’re able to reproduce the problem?

I had to change the script a bit but after running it looks fairly stable.

Running it with 100 requests yields

Starting with 2.00MB memory
Running request....................................................................................................Done
Stats:
Initial bootstrap: 6.00MB
Average memory usage per request: 184.32KB
Max memory usage per request: 16.00MB
Min memory usage per request: 0.00B

I had it run 10k loops and then i watchd ps -o pid,user,rss,vsz,comm yielding

PID   USER     RSS  VSZ  COMMAND
  183 root      61m 162m php

So, the actual memory usage while working is 61 MiB. I then tried to measure the memory usage while sleeping, before doing any work – the baseline memory usage – and found 29m. So that’s PHP’s minimal memory usage. The actual app is thus using 32 MiB.