puma: phased-restart: Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound)

Describe the bug

This seems to happen when doing phased-restarts using versioned deploy folders with symlinks.

We use a folder structure similar to this:

/app/app.d/{version (int)}
/app/app -> /app/app.d/{version} # symlink
/app/prev -> /app/app.d/{version-1} # symlink

When we deploy, we increment the version, run bundle etc, then switch the symlink and do a phased-restart of puma.

All seems to work fine, but after a few deploys, we also start pruning older versions, and then we see this error in the puma.log

[12357] - Starting phased worker restart, phase: 4
[12357] - Starting phased worker restart, phase: 4
[12357] + Changing to /var/local/app/app
[12357] + Changing to /var/local/app/app
[12357] - Stopping 28107 for phased upgrade...
[12357] - Stopping 28107 for phased upgrade...
[12357] - TERM sent to 28107...
[12357] - TERM sent to 28107...
[7886] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7886] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7886] Early termination of worker
[7886] Early termination of worker
[7944] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7944] + Gemfile in context: /var/local/app/app.d/909/Gemfile
/usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:91:in `block in materialize': Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound/usr/lib/ruby/site_ruby/2.5.0
/bundler/spec_set.rb:91:in `block in materialize': Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound)
)
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `map!'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `materialize'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:170:in `specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:237:in `specs_for'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:226:in `requested_specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:108:in `block in definition_method'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:20:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `map!'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `materialize'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:170:in `specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:237:in `specs_for'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:226:in `requested_specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:108:in `block in definition_method'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:20:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler.rb:107:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler.rb:107:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/setup.rb:20:in `<top (required)>'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/setup.rb:20:in `<top (required)>'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:302:in `rack_builder'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:302:in `rack_builder'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:321:in `load_rackup'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:321:in `load_rackup'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:246:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:246:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:158:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:158:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:165:in `start_server'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:165:in `start_server'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:269:in `worker'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:269:in `worker'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block (2 levels) in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block (2 levels) in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `fork'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `fork'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `times'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `times'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:204:in `check_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:204:in `check_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:487:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:487:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/launcher.rb:172:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/launcher.rb:172:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cli.rb:80:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cli.rb:80:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/bin/puma-wild:31:in `<main>'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/bin/puma-wild:31:in `<main>'
[8048] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[8048] + Gemfile in context: /var/local/app/app.d/909/Gemfile

❗️ Note that the app now lives in /app/app.d/909 but the errors are pointing to puma in /app/app.d/906 ❗️

Puma config:

#!/usr/bin/env puma

# The directory to operate out of.
#
# The default is the current directory.
#
directory "/var/local/app/app"

# Set the environment in which the rack's app will run. The value must be a string.
#
# The default is "development".
#
environment ENV["RAILS_ENV"] || "development"

# Store the pid of the server in the file at "path".
#
pidfile "/var/run/puma/puma.pid"

# Use "path" as the file to store the server info state. This is
# used by "pumactl" to query and control the server.
#
state_path "/var/run/puma/puma.state"

# Redirect STDOUT and STDERR to files specified. The 3rd parameter
# ("append") specifies whether the output is appended, the default is
# "false".
#
stdout_redirect "/var/log/puma.log", "/var/log/puma.log", true

# Configure "min" to be the minimum number of threads to use to answer
# requests and "max" the maximum.
#
# The default is "0, 16".
#
# threads 0, 16
threads 8, 8


# Bind the server to "url". "tcp://", "unix://" and "ssl://" are the only
# accepted protocols.
#
# The default is "tcp://0.0.0.0:9292".
#
bind 'unix:///var/run/puma/puma.sock'


# === Cluster mode ===

# How many worker processes to run.  Typically this is set to
# to the number of available cores.
#
# The default is "0".
#
workers ENV["PUMA_WORKERS"] || 16

# Allow workers to reload bundler context when master process is issued
# a USR1 signal. This allows proper reloading of gems while the master
# is preserved across a phased-restart. (incompatible with preload_app)
# (off by default)

# see https://github.com/puma/puma/blob/master/docs/deployment.md#restarting
prune_bundler

If I do a full restart, then things are back to normal, until it happens again (we keep 3 deploys back, so after 4 deploys…)

To Reproduce

I was trying to reproduce this in a clean environment, but I can’t manage to do it so far… 😕

Expected behavior phased-restart should reload everything from the new folder… It does seem to point to the right folder, but there’s some mysterious left-overs pointing to something old somewhere??

Server:

  • OS: Ubuntu 18.04 with systemd
  • Puma Version 4.2.1 (it doesn’t happen with 3.12)

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 49 (42 by maintainers)

Commits related to this issue

Most upvoted comments

I dug into what’s happening on my repro repo. I’ve got some notes to share and some thoughts on how to address the problem.

What causes the Could not find nio4r errors?

During the Bundler.setup process that is kicked off on a puma worker when it boots, Bundler tries to find all of the gems referenced by the Gemfile.lock. It searches for gems using an instance of Bundler::Index that is constructed by Bundler::Source::Rubygems#installed_specs. In Bundler::Source::Rubygems#installed_specs, Bundler iterates over the collection of specs (instances of Gem::Specification) corresponding to installed gems by calling Bundler::RubygemsIntegration#all_specs, which in turn calls Gem::Specification.stubs. Gem::Specification.stubs returns a result that includes the return value from Gem.loaded_specs, which is a collection of instances of Gem::Specification that correspond to already-activated gems (this includes nio4r since that gem was activated by the puma master–by puma-wild directly when you’re using prune_bundler).

When Bundler::Source::Rubygems#installed_specs iterates over the collection of specs, it rejects specs for which native extensions are missing. It determines whether or not native extensions are missing by executing the method #missing_extensions? on the corresponding Gem::Specification. Gem::Specification#missing_extensions relies on the result from Gem::BasicSpecification#extension_dir. When this method is called on the nio4r spec in the worker, it’s called on the same instance that was first created when the puma master activated the gem and returns the extension directory of the gem at the time that the puma master activated it. So, if you remove that directory after the puma master boots up, workers will still try to find files in that directory when they boot. If they’re not there, Bundler::Source::Rubygems#installed_specs excludes the gem from its index and Bundler::SpecSet#materialize is, in turn, unable to find the gem and raises GemNotFound.

How to fix it

We’re fortunate that the puma master doesn’t actually need to have a runtime dependency on nio4r. Basically, we can make it so that puma-wild never activates the nio4r gem in the master process at all. As long as we never require 'nio' in the master process (basically what @MSP-Greg was getting at), we’re good. I’ve tested that change in my repro repo (by patching puma version 4.3.1) and everything works. But if I apply the same patch to puma master (after #1893 was merged), I get a different error that’s preventing me from verifying the fix against master. I think that might be an unrelated regression introduced by #1893

Opening a PR for now: #2125

#2374 is merged, which was a prerequisite for the changes I want to introduce to fix this issue.

I’ve got a branch that fixes this issue. I’m working on fixing my tests for it.

I created a repo that demonstrates that this error occurs reliably if you’re deleting old releases

https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors

The master branch of that repo gets the error described in this thread. It’s using puma 4.3.1.

Could not find nio4r-2.5.2 in any of the sources
Run `bundle install` to install missing gems.
[292] Early termination of worker

The puma-master branch of the repro repo is using puma at commit 8c9b3ebc7a86bdc5488704a0225b76409b9f555f, which includes changes from #1893.

Workers still fail on startup, but with a different error

/usr/src/releases/1582131641325997600/Gemfile not found

I opened #2407 which is a robust solution to the original problem described in this issue. It makes it so that the puma master process can use totally separate gems from the puma worker processes, so it’s fine to remove nio4r’s compiled native extensions on disk and perform phased restarts. It also supports doing the same for any gem with native extensions loaded by the puma master process, such as ffi if you’re using puma_worker_killer. The problem with that PR, though, is that it increases the overall memory footprint of a puma cluster since worker processes can’t leverage COW memory by forking from the master process. We ultimately decided to close it.

I opened #2427 which fixes the nio4r issue with regard to phased restarts. It does not allow one to remove from disk compiled native extensions for other gems that are loaded by the puma master process (like ffi), but I suspect this doesn’t matter to most people. That PR will fix this problem for most folks.

For anyone else, I opened rubygems/rubygems#4004 which I think is the root cause of the issue in Bundler. If that issue is resolved, then it will be possible to perform a phased restart after removing compiled native extensions for any gem loaded by the puma master process, including ffi.

@cjlarose Thanks for the writeup. I’ll take a more detailed look this week.

Is that a problem Puma should solve?

@dentarg I’m not certain about this either. The bug might not even be with Puma itself–it might be a bug with bundler, for example. It just so happens that puma phased restarts combined with a deployment strategy that removes old releases exposes the bug. It’s also a little frustrating, too, because if you were running puma 3.x (before the introduction of nio4r in 4.x), and you were removing old releases, that was probably fine since puma 3.x had no runtime dependencies on gems that had native extensions.

I believe this should be solved by Puma as soon as possible. At least a workaround.

@gencer Puma doesn’t owe you anything. Maintainers are often unpaid volunteers. If you can do phased restarts using other web servers, then you should use those other web servers.

I think the problem is that some users have the same issue as Travis has, so ‘we’ need to determine what that is…

@cjlarose Neither puma nor its developers and also volunteers doesn’t owe me anything. I am deeply sorry if I offended anyone. I didn’t mean that way. You misunderstood me. I was just trying to say that it should be fixed on puma-gem (that’s what i believe because other web servers doesnt have this issue and puma 3.x also don’t have this.) not by puma developers. This could be of course “me” or someone who knows these things better than me.

Please accept my apologies on this matter.

Thanks @dentarg very interesting. I’ll try to find some time to try out #1893 and see. Reproducing the problem in itself was a challenge, and I was only seeing it in our staging environment (which is shared, so I can’t experiment there for too long). Hope I can still reproduce it, and then try again with #1893 and report back here.

Otherwise, is there a workaround perhaps? maybe moving away from prune_bundler? what’s the recommended approach for zero-downtime deploys with the now “old-school” symlinked structure?

@gingerlime The bug might have been there since before Puma 4, see https://github.com/puma/puma/issues/1593 and https://github.com/bundler/bundler/issues/6667, looks like the same problem you have but under different circumstances

There’s a PR trying to fix it https://github.com/puma/puma/pull/1893, perhaps you could try it out?

Puma v4.3.0 is still using Bundler.with_clean_env

https://github.com/puma/puma/blob/395337df4a3b27cc14eeab048016fb1ee85d2f83/lib/puma/launcher.rb#L267

https://github.com/puma/puma/pull/1893 switches to Bundler.with_original_env which reportedly solved the issue for one user in https://github.com/bundler/bundler/issues/6667

Puma 3 had no dependent gems. Puma 4 added nio4r as a dependency. I’m not sure but maybe bundler tries to reload nio4r? Where as before it didn’t need to reload anything?

@alx75 I think your intuition is correct here. Since the ffi gem would not be activated in the puma master process in your case, then the phased restart should complete successfully.

I was able to test this reliably here: https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors/tree/ffi

https://github.com/puma/puma/issues/1875 tracks the progress of a new architecture in puma that’ll hopefully resolve this issue.

Everyone having simlar issues should try master branch, as https://github.com/puma/puma/pull/1893 was merged and may help.

@gingerlime I thinks this problem occurs because you vendor the ruby gems in /var/local/app/app.d/{version (int)}. See:

/var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/launcher.rb:172:in `run'

If this version gets deleted the vendored gems also get deleted. You probably want to install gems in a shared persistent directory like: /var/local/app/app.d/shared

When I update a gem in Gemfile that puma uses like puma itself or as original poster said nio4r and try to do phased restart

Makes sense if you look at the code for how prune_bundler works. We’ll have to figure out how to change/reload Puma’s spec.

One small thing I saw in our staging logs, but I don’t see in my toy environment is:

Early termination of worker

Maybe it’s related to the problem?