puma: phased-restart errors after upgrading from Puma 3.12.6 to 5.0.4

Describe the bug

This seems to happen when doing phased-restarts using versioned deploy folders with symlinks.

We use a folder structure similar to this:

/app/app.d/{version (int)}
/app/app -> /app/app.d/{version} # symlink
/app/prev -> /app/app.d/{version-1} # symlink

When we deploy, we increment the version, run bundle etc, then switch the symlink and do a phased-restart of puma.

All seems to work fine, but after a few deploys, we also start pruning older versions, and then we see this error in the puma.log

/usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:86:in `block in materialize': Could not find json-2.3.1 in any of the sources (Bundler::GemNotFound)
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:80:in `map!'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:80:in `materialize'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:170:in `specs'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:237:in `specs_for'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:226:in `requested_specs'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/runtime.rb:101:in `block in definition_method'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/runtime.rb:20:in `setup'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler.rb:149:in `setup'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/setup.rb:20:in `block in <top (required)>'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/ui/shell.rb:136:in `with_level'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/ui/shell.rb:88:in `silence'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/setup.rb:20:in `<top (required)>'
        from /usr/lib/ruby/site_ruby/2.6.0/rubygems/core_ext/kernel_require.rb:92:in `require'
        from /usr/lib/ruby/site_ruby/2.6.0/rubygems/core_ext/kernel_require.rb:92:in `require'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:312:in `rack_builder'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:331:in `load_rackup'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:257:in `app'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/runner.rb:142:in `app'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/runner.rb:146:in `start_server'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster/worker.rb:57:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:195:in `worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `block in spawn_worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `fork'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `spawn_worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:78:in `block in spawn_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:71:in `times'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:71:in `spawn_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:142:in `check_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:410:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/launcher.rb:171:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cli.rb:80:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/bin/puma-wild:25:in `<main>'
[12769] + Gemfile in context: /var/local/app/app.d/1391/Gemfile
[12769] Early termination of worker

❗️ Note that the app now lives in /app/app.d/1390 but the errors are pointing to puma in /app/app.d/1386 ❗️

Puma config:

#!/usr/bin/env puma

# The directory to operate out of.
#
# The default is the current directory.
#
directory "/var/local/app/app"

# Set the environment in which the rack's app will run. The value must be a string.
#
# The default is "development".
#
environment ENV["RAILS_ENV"] || "development"

# Store the pid of the server in the file at "path".
#
pidfile "/var/run/puma/puma.pid"

# Use "path" as the file to store the server info state. This is
# used by "pumactl" to query and control the server.
#
state_path "/var/run/puma/puma.state"

# Redirect STDOUT and STDERR to files specified. The 3rd parameter
# ("append") specifies whether the output is appended, the default is
# "false".
#
stdout_redirect "/var/log/puma.log", "/var/log/puma.log", true

# Configure "min" to be the minimum number of threads to use to answer
# requests and "max" the maximum.
#
# The default is "0, 16".
#
# threads 0, 16
threads 8, 8


# Bind the server to "url". "tcp://", "unix://" and "ssl://" are the only
# accepted protocols.
#
# The default is "tcp://0.0.0.0:9292".
#
bind 'unix:///var/run/puma/puma.sock'


# === Cluster mode ===

# How many worker processes to run.  Typically this is set to
# to the number of available cores.
#
# The default is "0".
#
workers ENV["PUMA_WORKERS"] || 16

# Allow workers to reload bundler context when master process is issued
# a USR1 signal. This allows proper reloading of gems while the master
# is preserved across a phased-restart. (incompatible with preload_app)
# (off by default)

# see https://github.com/puma/puma/blob/master/docs/deployment.md#restarting
prune_bundler

If I do a full restart, then things are back to normal, until it happens again (we keep 3 deploys back, so after 4 deploys…)

To Reproduce

I was only able to reproduce this on our staging environment so far

Expected behavior phased-restart should reload everything from the new folder… It does seem to point to the right folder, but there’s some mysterious left-overs pointing to something old somewhere??

Server:

  • OS: Ubuntu 18.04 with systemd
  • Puma Version 5.0.4 (it doesn’t happen with 3.12)

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 28 (27 by maintainers)

Most upvoted comments

Ok so I took a look at all of the places where we explicitly use the json gem in puma. There are four places where we require 'json', all listed neatly in https://github.com/puma/puma/pull/2269. Let’s look at them individually:

The worker stat payload thread

https://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma/cluster/worker.rb#L114-L115

This usage of require 'json' is safe. It necessarily happens inside the worker process. If the user’s application depends on json, it’ll use that version. Otherwise, it’ll use the version of json bundled with Ruby.

WorkerHandle#ping!

https://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma/cluster/worker_handle.rb#L43-L48

This use of require 'json' is not safe. Assuming prune_bundler is on, the first time this is called will be after the first worker has booted. That worker will check in to the master process and the master process will try to parse the message. Under the normal RubyGems require logic, if the user’s application defined a dependency on the json gem, the puma master process will add that version to the $LOAD_PATH, activate the json gem’s gemspec, and load the puma gem into memory. If the user’s application did not define such a dependency, this will load the version of the json gem that’s bundled with Ruby.

Subsequent attempts to boot workers will try to use the same version as the version that was first require’d. As a consequence, so long as the application declares a dependency on the json gem, removing that first version’s representation on disk will cause workers to fail, even if they’re booting up a version of the application that depends on the same version of json (they didn’t upgrade or downgrade). The root cause of this particular problem is rubygems/rubygems#4004. The other problem is that once the json gem is loaded into the puma master process, it is no longer possible to upgrade or downgrade the json gem at all in a subsequent phased restart (this is true even if you don’t delete old versions of your application). In short, our best path forward is to remove the json gem entirely from being loaded into the master process.

The use of the JSON gem in this particular method was introduced in #2124 and can be safely reverted. Doing so will probably fix @gingerlime’s problem and allow many other folks to upgrade to a modern version of Puma. The remaining two places where we require 'json' probably affect fewer people, but the impact is just as severe.

Status server requests to /gc-stats, /stats, and /thread-backtraces.

https://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma/app/status.rb#L25-L27

https://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma/app/status.rb#L51-L62

This usage of require 'json' is similarly unsafe. As soon as json is required into the puma master process, it’s possible for subsequent worker boot operations to fail (either because the native extensions for an old release were deleted or because the version of json has changed in the new version of the application).

The use of the json gem in the status server was introduced in #1801 in order to fix a bug with malformed responses from /gc-stats on JRuby. At the time of authorship of #1801, /stats did not rely on the json gem and /thread-backtraces did not yet exist.

/thread-backtraces is a fairly recent addition from #2054. At the time of authorship, it just depended on the json gem having already been loaded.

It is probably possible to fix the original problem of malformed JSON with /gc-stats without loading the json gem. It would just mean that we’d have to implement and maintain basically a hand-rolled JSON serialization implementation.

For /stats, it’s possible to implement this without the json gem. In fact, before #2086, it didn’t use the json gem at all–just hand-rolled JSON serialization.

For /thread-backtraces, this can be implemented with some more hand-rolled JSON serialization, but it never has been in Puma’s history (like I said, this one is fairly recent).

Puma.stats

https://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/lib/puma.rb#L27-L30

This use of require 'json' is similarly unsafe for the same reasons as before.

Puma.stats returns a JSON-encoded string. It is considered public API.

#2086 modified this method to just return a Hash instead of a String, but that was later rolled back in #2253 because it was a breaking API change. Interestingly, the revert operation in #2253 effectively deferred JSON serialization until the last moment (instead of Cluster#stats returning a JSON-encoded string as it did before #2086, it returns a hash). Conversion to a JSON string happens in the body of Puma.stats.

Similar to the /stats in the discussion about the status server, there was a time when this method did not depend on the json gem, and we can roll it back with some more hand-rolled JSON serialization.

@cjlarose we’ve been running puma master all day on our staging environment, with a dozen or so deploys all using phased restarts. So far it looks great 👍 🚀

I’m not sure we really tested all scenarios, like Gem upgrades/new installs/removals though, but I’m definitely optimistic about the upgrade to 5.x now.

Thanks so much for all your hard work and patience.

@cjlarose yes, of course. it’s the least I could do 😃 I’ll get it tested on our staging environment so we can go through a number of deploys and hopefully spot any issues. I’ll report back within a day or two at the latest.

I was able to create a minimal repro for @gingerlime’s issue in Docker

https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors/tree/could-not-find-json

It demonstrates that if you have a deployment process that deletes gems from old releases and your application has a dependency on the json gem (declared in the application Gemfile or otherwise activated by RubyGem’s gem method), workers will fail on boot repeatedly with Could not find json-2.3.1 in any of the sources.

I’m still investigating the precise mechanism that causes it.

@cjlarose

No, I’m thinking about how the last Puma line leading to the error in the above log is (v5.0.4):

https://github.com/puma/puma/blob/1f0c08e3a7de13452e2f05d3160d90e04ab1987b/lib/puma/configuration.rb#L312

It’s not throwing the exception on a require 'json' statement.

I see. I think you’re on to something with the add_runtime_dependency call. I added an integration test to ensure that it is possible to upgrade/downgrade the json gem in an application as part of a phased restart (this fails). What I found is that without the add_runtime_dependency in Puma’s gemspec (basically what happens on master now), we get a bunch of constant redefinition warnings when workers boot.

/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/2.7.0/json/version.rb:4: warning: already initialized constant JSON::VERSION
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/gems/2.7.0/gems/json-2.3.1/lib/json/version.rb:4: warning: previous definition of VERSION was here
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/2.7.0/json/version.rb:5: warning: already initialized constant JSON::VERSION_ARRAY
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/gems/2.7.0/gems/json-2.3.1/lib/json/version.rb:5: warning: previous definition of VERSION_ARRAY was here
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/2.7.0/json/version.rb:6: warning: already initialized constant JSON::VERSION_MAJOR
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/gems/2.7.0/gems/json-2.3.1/lib/json/version.rb:6: warning: previous definition of VERSION_MAJOR was here
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/2.7.0/json/version.rb:7: warning: already initialized constant JSON::VERSION_MINOR
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/gems/2.7.0/gems/json-2.3.1/lib/json/version.rb:7: warning: previous definition of VERSION_MINOR was here
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/2.7.0/json/version.rb:8: warning: already initialized constant JSON::VERSION_BUILD
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/gems/2.7.0/gems/json-2.3.1/lib/json/version.rb:8: warning: previous definition of VERSION_BUILD was here
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/2.7.0/json/common.rb:100: warning: already initialized constant JSON::NaN
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/gems/2.7.0/gems/json-2.3.1/lib/json/common.rb:107: warning: previous definition of NaN was here
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/2.7.0/json/common.rb:102: warning: already initialized constant JSON::Infinity
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/gems/2.7.0/gems/json-2.3.1/lib/json/common.rb:109: warning: previous definition of Infinity was here
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/2.7.0/json/common.rb:104: warning: already initialized constant JSON::MinusInfinity
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/gems/2.7.0/gems/json-2.3.1/lib/json/common.rb:111: warning: previous definition of MinusInfinity was here
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/2.7.0/json/common.rb:129: warning: already initialized constant JSON::UnparserError
/Users/chris.larose/.asdf/installs/ruby/2.7.0/lib/ruby/gems/2.7.0/gems/json-2.3.1/lib/json/common.rb:136: warning: previous definition of UnparserError was here

I think this indicates that the JSON gem is being double-loaded (it’s loaded once by the puma master process and loaded again by the workers). Usually require is supposed to prevent double-loading, so there’s something strange going on. I suspect that when the workers require 'json', they’re referencing a different path to load the gem, so Ruby happily loads it up into memory, clobbering the JSON module that is already loaded into memory.

I’ll spend more time digging into this, but I suspect @MSP-Greg ‘s hunch that we should add the add_runtime_dependency declaration is correct. That’ll force the puma master process to use the version of the json gem that is declared in the users’ application’s Gemfile.lock, if available, instead of using the one bundled by Ruby. Doing that alone doesn’t fix OP’s original problem, but it’s a good first step.

@MSP-Greg I think the json gem is bundled with every version of Ruby that puma supports, so an explicit runtime dependency declaration for it is not necessary. I’m not 100% sure, but there’s some discussion about it in #1801

/usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:86:in `block in materialize’: Could not find json-2.3.1 in any of the sources (Bundler::GemNotFound)

For some reason I remember bringing this up during the nio4r change but I was told that json wasn’t an issue?

For reference, here’s that discussion: https://github.com/puma/puma/pull/2427#issuecomment-712851931

From https://github.com/puma/puma/issues/2018#issuecomment-708190925

I opened #2427 which fixes the nio4r issue with regard to phased restarts. It does not allow one to remove from disk compiled native extensions for other gems that are loaded by the puma master process (like ffi)

Is Puma master process still using json because of the following?

https://github.com/puma/puma/blob/f0dffd4d78c19287ff9152a7809d24d7c33f4952/lib/puma/cluster/worker_handle.rb#L43-L47

https://github.com/puma/puma/blob/f0dffd4d78c19287ff9152a7809d24d7c33f4952/lib/puma/cluster.rb#L441-L442