puma: phased-restart errors after upgrading from Puma 3.12.6 to 5.0.4
Describe the bug
This seems to happen when doing phased-restarts using versioned deploy folders with symlinks.
We use a folder structure similar to this:
/app/app.d/{version (int)}
/app/app -> /app/app.d/{version} # symlink
/app/prev -> /app/app.d/{version-1} # symlink
When we deploy, we increment the version, run bundle etc, then switch the symlink and do a phased-restart of puma.
All seems to work fine, but after a few deploys, we also start pruning older versions, and then we see this error in the puma.log
/usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:86:in `block in materialize': Could not find json-2.3.1 in any of the sources (Bundler::GemNotFound)
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:80:in `map!'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:80:in `materialize'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:170:in `specs'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:237:in `specs_for'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:226:in `requested_specs'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/runtime.rb:101:in `block in definition_method'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/runtime.rb:20:in `setup'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler.rb:149:in `setup'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/setup.rb:20:in `block in <top (required)>'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/ui/shell.rb:136:in `with_level'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/ui/shell.rb:88:in `silence'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/setup.rb:20:in `<top (required)>'
        from /usr/lib/ruby/site_ruby/2.6.0/rubygems/core_ext/kernel_require.rb:92:in `require'
        from /usr/lib/ruby/site_ruby/2.6.0/rubygems/core_ext/kernel_require.rb:92:in `require'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:312:in `rack_builder'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:331:in `load_rackup'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:257:in `app'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/runner.rb:142:in `app'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/runner.rb:146:in `start_server'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster/worker.rb:57:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:195:in `worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `block in spawn_worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `fork'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `spawn_worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:78:in `block in spawn_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:71:in `times'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:71:in `spawn_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:142:in `check_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:410:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/launcher.rb:171:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cli.rb:80:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/bin/puma-wild:25:in `<main>'
[12769] + Gemfile in context: /var/local/app/app.d/1391/Gemfile
[12769] Early termination of worker
❗️ Note that the app now lives in /app/app.d/1390 but the errors are pointing to puma in /app/app.d/1386 ❗️
Puma config:
#!/usr/bin/env puma
# The directory to operate out of.
#
# The default is the current directory.
#
directory "/var/local/app/app"
# Set the environment in which the rack's app will run. The value must be a string.
#
# The default is "development".
#
environment ENV["RAILS_ENV"] || "development"
# Store the pid of the server in the file at "path".
#
pidfile "/var/run/puma/puma.pid"
# Use "path" as the file to store the server info state. This is
# used by "pumactl" to query and control the server.
#
state_path "/var/run/puma/puma.state"
# Redirect STDOUT and STDERR to files specified. The 3rd parameter
# ("append") specifies whether the output is appended, the default is
# "false".
#
stdout_redirect "/var/log/puma.log", "/var/log/puma.log", true
# Configure "min" to be the minimum number of threads to use to answer
# requests and "max" the maximum.
#
# The default is "0, 16".
#
# threads 0, 16
threads 8, 8
# Bind the server to "url". "tcp://", "unix://" and "ssl://" are the only
# accepted protocols.
#
# The default is "tcp://0.0.0.0:9292".
#
bind 'unix:///var/run/puma/puma.sock'
# === Cluster mode ===
# How many worker processes to run.  Typically this is set to
# to the number of available cores.
#
# The default is "0".
#
workers ENV["PUMA_WORKERS"] || 16
# Allow workers to reload bundler context when master process is issued
# a USR1 signal. This allows proper reloading of gems while the master
# is preserved across a phased-restart. (incompatible with preload_app)
# (off by default)
# see https://github.com/puma/puma/blob/master/docs/deployment.md#restarting
prune_bundler
If I do a full restart, then things are back to normal, until it happens again (we keep 3 deploys back, so after 4 deploys…)
To Reproduce
I was only able to reproduce this on our staging environment so far
Expected behavior phased-restart should reload everything from the new folder… It does seem to point to the right folder, but there’s some mysterious left-overs pointing to something old somewhere??
Server:
- OS: Ubuntu 18.04 with systemd
- Puma Version 5.0.4 (it doesn’t happen with 3.12)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 28 (27 by maintainers)
Ok so I took a look at all of the places where we explicitly use the
jsongem in puma. There are four places where werequire 'json', all listed neatly in https://github.com/puma/puma/pull/2269. Let’s look at them individually:The worker
stat payloadthreadhttps://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma/cluster/worker.rb#L114-L115
This usage of
require 'json'is safe. It necessarily happens inside the worker process. If the user’s application depends onjson, it’ll use that version. Otherwise, it’ll use the version ofjsonbundled with Ruby.WorkerHandle#ping!https://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma/cluster/worker_handle.rb#L43-L48
This use of
require 'json'is not safe. Assumingprune_bundleris on, the first time this is called will be after the first worker has booted. That worker will check in to the master process and the master process will try to parse the message. Under the normal RubyGemsrequirelogic, if the user’s application defined a dependency on thejsongem, the puma master process will add that version to the$LOAD_PATH, activate thejsongem’s gemspec, andloadthe puma gem into memory. If the user’s application did not define such a dependency, this willloadthe version of thejsongem that’s bundled with Ruby.Subsequent attempts to boot workers will try to use the same version as the version that was first
require’d. As a consequence, so long as the application declares a dependency on thejsongem, removing that first version’s representation on disk will cause workers to fail, even if they’re booting up a version of the application that depends on the same version ofjson(they didn’t upgrade or downgrade). The root cause of this particular problem is rubygems/rubygems#4004. The other problem is that once thejsongem isloaded into the puma master process, it is no longer possible to upgrade or downgrade thejsongem at all in a subsequent phased restart (this is true even if you don’t delete old versions of your application). In short, our best path forward is to remove thejsongem entirely from being loaded into the master process.The use of the JSON gem in this particular method was introduced in #2124 and can be safely reverted. Doing so will probably fix @gingerlime’s problem and allow many other folks to upgrade to a modern version of Puma. The remaining two places where we
require 'json'probably affect fewer people, but the impact is just as severe.Status server requests to
/gc-stats,/stats, and/thread-backtraces.https://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma/app/status.rb#L25-L27
https://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma/app/status.rb#L51-L62
This usage of
require 'json'is similarly unsafe. As soon asjsonis required into the puma master process, it’s possible for subsequent worker boot operations to fail (either because the native extensions for an old release were deleted or because the version ofjsonhas changed in the new version of the application).The use of the
jsongem in the status server was introduced in #1801 in order to fix a bug with malformed responses from/gc-statson JRuby. At the time of authorship of #1801,/statsdid not rely on thejsongem and/thread-backtracesdid not yet exist./thread-backtracesis a fairly recent addition from #2054. At the time of authorship, it just depended on thejsongem having already been loaded.It is probably possible to fix the original problem of malformed JSON with
/gc-statswithout loading thejsongem. It would just mean that we’d have to implement and maintain basically a hand-rolled JSON serialization implementation.For
/stats, it’s possible to implement this without thejsongem. In fact, before #2086, it didn’t use thejsongem at all–just hand-rolled JSON serialization.For
/thread-backtraces, this can be implemented with some more hand-rolled JSON serialization, but it never has been in Puma’s history (like I said, this one is fairly recent).Puma.stats
https://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma.rb#L27-L30
This use of
require 'json'is similarly unsafe for the same reasons as before.Puma.statsreturns a JSON-encoded string. It is considered public API.#2086 modified this method to just return a Hash instead of a String, but that was later rolled back in #2253 because it was a breaking API change. Interestingly, the revert operation in #2253 effectively deferred JSON serialization until the last moment (instead of
Cluster#statsreturning a JSON-encoded string as it did before #2086, it returns a hash). Conversion to a JSON string happens in the body ofPuma.stats.Similar to the
/statsin the discussion about the status server, there was a time when this method did not depend on thejsongem, and we can roll it back with some more hand-rolled JSON serialization.@cjlarose we’ve been running puma master all day on our staging environment, with a dozen or so deploys all using phased restarts. So far it looks great 👍 🚀
I’m not sure we really tested all scenarios, like Gem upgrades/new installs/removals though, but I’m definitely optimistic about the upgrade to 5.x now.
Thanks so much for all your hard work and patience.
@cjlarose yes, of course. it’s the least I could do 😃 I’ll get it tested on our staging environment so we can go through a number of deploys and hopefully spot any issues. I’ll report back within a day or two at the latest.
I was able to create a minimal repro for @gingerlime’s issue in Docker
https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors/tree/could-not-find-json
It demonstrates that if you have a deployment process that deletes gems from old releases and your application has a dependency on the
jsongem (declared in the application Gemfile or otherwise activated by RubyGem’sgemmethod), workers will fail on boot repeatedly withCould not find json-2.3.1 in any of the sources.I’m still investigating the precise mechanism that causes it.
I see. I think you’re on to something with the
add_runtime_dependencycall. I added an integration test to ensure that it is possible to upgrade/downgrade thejsongem in an application as part of a phased restart (this fails). What I found is that without theadd_runtime_dependencyin Puma’s gemspec (basically what happens onmasternow), we get a bunch of constant redefinition warnings when workers boot.I think this indicates that the JSON gem is being double-loaded (it’s loaded once by the puma master process and loaded again by the workers). Usually
requireis supposed to prevent double-loading, so there’s something strange going on. I suspect that when the workersrequire 'json', they’re referencing a different path to load the gem, so Ruby happily loads it up into memory, clobbering theJSONmodule that is already loaded into memory.I’ll spend more time digging into this, but I suspect @MSP-Greg ‘s hunch that we should add the
add_runtime_dependencydeclaration is correct. That’ll force the puma master process to use the version of thejsongem that is declared in the users’ application’s Gemfile.lock, if available, instead of using the one bundled by Ruby. Doing that alone doesn’t fix OP’s original problem, but it’s a good first step.@MSP-Greg I think the
jsongem is bundled with every version of Ruby thatpumasupports, so an explicit runtime dependency declaration for it is not necessary. I’m not 100% sure, but there’s some discussion about it in #1801For reference, here’s that discussion: https://github.com/puma/puma/pull/2427#issuecomment-712851931
From https://github.com/puma/puma/issues/2018#issuecomment-708190925
Is Puma master process still using json because of the following?
https://github.com/puma/puma/blob/f0dffd4d78c19287ff9152a7809d24d7c33f4952/lib/puma/cluster/worker_handle.rb#L43-L47
https://github.com/puma/puma/blob/f0dffd4d78c19287ff9152a7809d24d7c33f4952/lib/puma/cluster.rb#L441-L442