puma: phased-restart errors after upgrading from Puma 3.12.6 to 5.0.4
Describe the bug
This seems to happen when doing phased-restarts using versioned deploy folders with symlinks.
We use a folder structure similar to this:
/app/app.d/{version (int)}
/app/app -> /app/app.d/{version} # symlink
/app/prev -> /app/app.d/{version-1} # symlink
When we deploy, we increment the version, run bundle etc, then switch the symlink and do a phased-restart of puma.
All seems to work fine, but after a few deploys, we also start pruning older versions, and then we see this error in the puma.log
/usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:86:in `block in materialize': Could not find json-2.3.1 in any of the sources (Bundler::GemNotFound)
from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:80:in `map!'
from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:80:in `materialize'
from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:170:in `specs'
from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:237:in `specs_for'
from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:226:in `requested_specs'
from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/runtime.rb:101:in `block in definition_method'
from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/runtime.rb:20:in `setup'
from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler.rb:149:in `setup'
from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/setup.rb:20:in `block in <top (required)>'
from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/ui/shell.rb:136:in `with_level'
from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/ui/shell.rb:88:in `silence'
from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/setup.rb:20:in `<top (required)>'
from /usr/lib/ruby/site_ruby/2.6.0/rubygems/core_ext/kernel_require.rb:92:in `require'
from /usr/lib/ruby/site_ruby/2.6.0/rubygems/core_ext/kernel_require.rb:92:in `require'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:312:in `rack_builder'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:331:in `load_rackup'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:257:in `app'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/runner.rb:142:in `app'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/runner.rb:146:in `start_server'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster/worker.rb:57:in `run'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:195:in `worker'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `block in spawn_worker'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `fork'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `spawn_worker'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:78:in `block in spawn_workers'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:71:in `times'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:71:in `spawn_workers'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:142:in `check_workers'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:410:in `run'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/launcher.rb:171:in `run'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cli.rb:80:in `run'
from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/bin/puma-wild:25:in `<main>'
[12769] + Gemfile in context: /var/local/app/app.d/1391/Gemfile
[12769] Early termination of worker
❗️ Note that the app now lives in /app/app.d/1390
but the errors are pointing to puma in /app/app.d/1386
❗️
Puma config:
#!/usr/bin/env puma
# The directory to operate out of.
#
# The default is the current directory.
#
directory "/var/local/app/app"
# Set the environment in which the rack's app will run. The value must be a string.
#
# The default is "development".
#
environment ENV["RAILS_ENV"] || "development"
# Store the pid of the server in the file at "path".
#
pidfile "/var/run/puma/puma.pid"
# Use "path" as the file to store the server info state. This is
# used by "pumactl" to query and control the server.
#
state_path "/var/run/puma/puma.state"
# Redirect STDOUT and STDERR to files specified. The 3rd parameter
# ("append") specifies whether the output is appended, the default is
# "false".
#
stdout_redirect "/var/log/puma.log", "/var/log/puma.log", true
# Configure "min" to be the minimum number of threads to use to answer
# requests and "max" the maximum.
#
# The default is "0, 16".
#
# threads 0, 16
threads 8, 8
# Bind the server to "url". "tcp://", "unix://" and "ssl://" are the only
# accepted protocols.
#
# The default is "tcp://0.0.0.0:9292".
#
bind 'unix:///var/run/puma/puma.sock'
# === Cluster mode ===
# How many worker processes to run. Typically this is set to
# to the number of available cores.
#
# The default is "0".
#
workers ENV["PUMA_WORKERS"] || 16
# Allow workers to reload bundler context when master process is issued
# a USR1 signal. This allows proper reloading of gems while the master
# is preserved across a phased-restart. (incompatible with preload_app)
# (off by default)
# see https://github.com/puma/puma/blob/master/docs/deployment.md#restarting
prune_bundler
If I do a full restart, then things are back to normal, until it happens again (we keep 3 deploys back, so after 4 deploys…)
To Reproduce
I was only able to reproduce this on our staging environment so far
Expected behavior phased-restart should reload everything from the new folder… It does seem to point to the right folder, but there’s some mysterious left-overs pointing to something old somewhere??
Server:
- OS: Ubuntu 18.04 with systemd
- Puma Version 5.0.4 (it doesn’t happen with 3.12)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 28 (27 by maintainers)
Ok so I took a look at all of the places where we explicitly use the
json
gem in puma. There are four places where werequire 'json'
, all listed neatly in https://github.com/puma/puma/pull/2269. Let’s look at them individually:The worker
stat payload
threadhttps://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma/cluster/worker.rb#L114-L115
This usage of
require 'json'
is safe. It necessarily happens inside the worker process. If the user’s application depends onjson
, it’ll use that version. Otherwise, it’ll use the version ofjson
bundled with Ruby.WorkerHandle#ping!
https://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma/cluster/worker_handle.rb#L43-L48
This use of
require 'json'
is not safe. Assumingprune_bundler
is on, the first time this is called will be after the first worker has booted. That worker will check in to the master process and the master process will try to parse the message. Under the normal RubyGemsrequire
logic, if the user’s application defined a dependency on thejson
gem, the puma master process will add that version to the$LOAD_PATH
, activate thejson
gem’s gemspec, andload
the puma gem into memory. If the user’s application did not define such a dependency, this willload
the version of thejson
gem that’s bundled with Ruby.Subsequent attempts to boot workers will try to use the same version as the version that was first
require
’d. As a consequence, so long as the application declares a dependency on thejson
gem, removing that first version’s representation on disk will cause workers to fail, even if they’re booting up a version of the application that depends on the same version ofjson
(they didn’t upgrade or downgrade). The root cause of this particular problem is rubygems/rubygems#4004. The other problem is that once thejson
gem isload
ed into the puma master process, it is no longer possible to upgrade or downgrade thejson
gem at all in a subsequent phased restart (this is true even if you don’t delete old versions of your application). In short, our best path forward is to remove thejson
gem entirely from being loaded into the master process.The use of the JSON gem in this particular method was introduced in #2124 and can be safely reverted. Doing so will probably fix @gingerlime’s problem and allow many other folks to upgrade to a modern version of Puma. The remaining two places where we
require 'json'
probably affect fewer people, but the impact is just as severe.Status server requests to
/gc-stats
,/stats
, and/thread-backtraces
.https://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma/app/status.rb#L25-L27
https://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma/app/status.rb#L51-L62
This usage of
require 'json'
is similarly unsafe. As soon asjson
is required into the puma master process, it’s possible for subsequent worker boot operations to fail (either because the native extensions for an old release were deleted or because the version ofjson
has changed in the new version of the application).The use of the
json
gem in the status server was introduced in #1801 in order to fix a bug with malformed responses from/gc-stats
on JRuby. At the time of authorship of #1801,/stats
did not rely on thejson
gem and/thread-backtraces
did not yet exist./thread-backtraces
is a fairly recent addition from #2054. At the time of authorship, it just depended on thejson
gem having already been loaded.It is probably possible to fix the original problem of malformed JSON with
/gc-stats
without loading thejson
gem. It would just mean that we’d have to implement and maintain basically a hand-rolled JSON serialization implementation.For
/stats
, it’s possible to implement this without thejson
gem. In fact, before #2086, it didn’t use thejson
gem at all–just hand-rolled JSON serialization.For
/thread-backtraces
, this can be implemented with some more hand-rolled JSON serialization, but it never has been in Puma’s history (like I said, this one is fairly recent).Puma.stats
https://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma.rb#L27-L30
This use of
require 'json'
is similarly unsafe for the same reasons as before.Puma.stats
returns a JSON-encoded string. It is considered public API.#2086 modified this method to just return a Hash instead of a String, but that was later rolled back in #2253 because it was a breaking API change. Interestingly, the revert operation in #2253 effectively deferred JSON serialization until the last moment (instead of
Cluster#stats
returning a JSON-encoded string as it did before #2086, it returns a hash). Conversion to a JSON string happens in the body ofPuma.stats
.Similar to the
/stats
in the discussion about the status server, there was a time when this method did not depend on thejson
gem, and we can roll it back with some more hand-rolled JSON serialization.@cjlarose we’ve been running puma master all day on our staging environment, with a dozen or so deploys all using phased restarts. So far it looks great 👍 🚀
I’m not sure we really tested all scenarios, like Gem upgrades/new installs/removals though, but I’m definitely optimistic about the upgrade to 5.x now.
Thanks so much for all your hard work and patience.
@cjlarose yes, of course. it’s the least I could do 😃 I’ll get it tested on our staging environment so we can go through a number of deploys and hopefully spot any issues. I’ll report back within a day or two at the latest.
I was able to create a minimal repro for @gingerlime’s issue in Docker
https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors/tree/could-not-find-json
It demonstrates that if you have a deployment process that deletes gems from old releases and your application has a dependency on the
json
gem (declared in the application Gemfile or otherwise activated by RubyGem’sgem
method), workers will fail on boot repeatedly withCould not find json-2.3.1 in any of the sources
.I’m still investigating the precise mechanism that causes it.
I see. I think you’re on to something with the
add_runtime_dependency
call. I added an integration test to ensure that it is possible to upgrade/downgrade thejson
gem in an application as part of a phased restart (this fails). What I found is that without theadd_runtime_dependency
in Puma’s gemspec (basically what happens onmaster
now), we get a bunch of constant redefinition warnings when workers boot.I think this indicates that the JSON gem is being double-loaded (it’s loaded once by the puma master process and loaded again by the workers). Usually
require
is supposed to prevent double-loading, so there’s something strange going on. I suspect that when the workersrequire 'json'
, they’re referencing a different path to load the gem, so Ruby happily loads it up into memory, clobbering theJSON
module that is already loaded into memory.I’ll spend more time digging into this, but I suspect @MSP-Greg ‘s hunch that we should add the
add_runtime_dependency
declaration is correct. That’ll force the puma master process to use the version of thejson
gem that is declared in the users’ application’s Gemfile.lock, if available, instead of using the one bundled by Ruby. Doing that alone doesn’t fix OP’s original problem, but it’s a good first step.@MSP-Greg I think the
json
gem is bundled with every version of Ruby thatpuma
supports, so an explicit runtime dependency declaration for it is not necessary. I’m not 100% sure, but there’s some discussion about it in #1801For reference, here’s that discussion: https://github.com/puma/puma/pull/2427#issuecomment-712851931
From https://github.com/puma/puma/issues/2018#issuecomment-708190925
Is Puma master process still using json because of the following?
https://github.com/puma/puma/blob/f0dffd4d78c19287ff9152a7809d24d7c33f4952/lib/puma/cluster/worker_handle.rb#L43-L47
https://github.com/puma/puma/blob/f0dffd4d78c19287ff9152a7809d24d7c33f4952/lib/puma/cluster.rb#L441-L442