puma: phased-restart: Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound)
Describe the bug
This seems to happen when doing phased-restarts using versioned deploy folders with symlinks.
We use a folder structure similar to this:
/app/app.d/{version (int)}
/app/app -> /app/app.d/{version} # symlink
/app/prev -> /app/app.d/{version-1} # symlink
When we deploy, we increment the version, run bundle etc, then switch the symlink and do a phased-restart of puma.
All seems to work fine, but after a few deploys, we also start pruning older versions, and then we see this error in the puma.log
[12357] - Starting phased worker restart, phase: 4
[12357] - Starting phased worker restart, phase: 4
[12357] + Changing to /var/local/app/app
[12357] + Changing to /var/local/app/app
[12357] - Stopping 28107 for phased upgrade...
[12357] - Stopping 28107 for phased upgrade...
[12357] - TERM sent to 28107...
[12357] - TERM sent to 28107...
[7886] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7886] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7886] Early termination of worker
[7886] Early termination of worker
[7944] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7944] + Gemfile in context: /var/local/app/app.d/909/Gemfile
/usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:91:in `block in materialize': Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound/usr/lib/ruby/site_ruby/2.5.0
/bundler/spec_set.rb:91:in `block in materialize': Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound)
)
from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `map!'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `materialize'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:170:in `specs'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:237:in `specs_for'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:226:in `requested_specs'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:108:in `block in definition_method'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:20:in `setup'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `map!'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `materialize'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:170:in `specs'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:237:in `specs_for'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:226:in `requested_specs'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:108:in `block in definition_method'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:20:in `setup'
from /usr/lib/ruby/site_ruby/2.5.0/bundler.rb:107:in `setup'
from /usr/lib/ruby/site_ruby/2.5.0/bundler.rb:107:in `setup'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/setup.rb:20:in `<top (required)>'
from /usr/lib/ruby/site_ruby/2.5.0/bundler/setup.rb:20:in `<top (required)>'
from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:302:in `rack_builder'
from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:302:in `rack_builder'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:321:in `load_rackup'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:321:in `load_rackup'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:246:in `app'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:246:in `app'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:158:in `app'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:158:in `app'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:165:in `start_server'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:165:in `start_server'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:269:in `worker'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:269:in `worker'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block (2 levels) in spawn_workers'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block (2 levels) in spawn_workers'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `fork'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `fork'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block in spawn_workers'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `times'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block in spawn_workers'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `times'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `spawn_workers'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `spawn_workers'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:204:in `check_workers'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:204:in `check_workers'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:487:in `run'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:487:in `run'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/launcher.rb:172:in `run'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/launcher.rb:172:in `run'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cli.rb:80:in `run'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cli.rb:80:in `run'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/bin/puma-wild:31:in `<main>'
from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/bin/puma-wild:31:in `<main>'
[8048] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[8048] + Gemfile in context: /var/local/app/app.d/909/Gemfile
❗️ Note that the app now lives in /app/app.d/909
but the errors are pointing to puma in /app/app.d/906
❗️
Puma config:
#!/usr/bin/env puma
# The directory to operate out of.
#
# The default is the current directory.
#
directory "/var/local/app/app"
# Set the environment in which the rack's app will run. The value must be a string.
#
# The default is "development".
#
environment ENV["RAILS_ENV"] || "development"
# Store the pid of the server in the file at "path".
#
pidfile "/var/run/puma/puma.pid"
# Use "path" as the file to store the server info state. This is
# used by "pumactl" to query and control the server.
#
state_path "/var/run/puma/puma.state"
# Redirect STDOUT and STDERR to files specified. The 3rd parameter
# ("append") specifies whether the output is appended, the default is
# "false".
#
stdout_redirect "/var/log/puma.log", "/var/log/puma.log", true
# Configure "min" to be the minimum number of threads to use to answer
# requests and "max" the maximum.
#
# The default is "0, 16".
#
# threads 0, 16
threads 8, 8
# Bind the server to "url". "tcp://", "unix://" and "ssl://" are the only
# accepted protocols.
#
# The default is "tcp://0.0.0.0:9292".
#
bind 'unix:///var/run/puma/puma.sock'
# === Cluster mode ===
# How many worker processes to run. Typically this is set to
# to the number of available cores.
#
# The default is "0".
#
workers ENV["PUMA_WORKERS"] || 16
# Allow workers to reload bundler context when master process is issued
# a USR1 signal. This allows proper reloading of gems while the master
# is preserved across a phased-restart. (incompatible with preload_app)
# (off by default)
# see https://github.com/puma/puma/blob/master/docs/deployment.md#restarting
prune_bundler
If I do a full restart, then things are back to normal, until it happens again (we keep 3 deploys back, so after 4 deploys…)
To Reproduce
I was trying to reproduce this in a clean environment, but I can’t manage to do it so far… 😕
Expected behavior phased-restart should reload everything from the new folder… It does seem to point to the right folder, but there’s some mysterious left-overs pointing to something old somewhere??
Server:
- OS: Ubuntu 18.04 with systemd
- Puma Version 4.2.1 (it doesn’t happen with 3.12)
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 49 (42 by maintainers)
I dug into what’s happening on my repro repo. I’ve got some notes to share and some thoughts on how to address the problem.
What causes the
Could not find nio4r
errors?During the
Bundler.setup
process that is kicked off on a puma worker when it boots, Bundler tries to find all of the gems referenced by theGemfile.lock
. It searches for gems using an instance ofBundler::Index
that is constructed byBundler::Source::Rubygems#installed_specs
. InBundler::Source::Rubygems#installed_specs
, Bundler iterates over the collection of specs (instances ofGem::Specification
) corresponding to installed gems by callingBundler::RubygemsIntegration#all_specs
, which in turn callsGem::Specification.stubs
.Gem::Specification.stubs
returns a result that includes the return value fromGem.loaded_specs
, which is a collection of instances ofGem::Specification
that correspond to already-activated gems (this includesnio4r
since that gem was activated by the puma master–bypuma-wild
directly when you’re usingprune_bundler
).When
Bundler::Source::Rubygems#installed_specs
iterates over the collection of specs, it rejects specs for which native extensions are missing. It determines whether or not native extensions are missing by executing the method#missing_extensions?
on the correspondingGem::Specification
.Gem::Specification#missing_extensions
relies on the result fromGem::BasicSpecification#extension_dir
. When this method is called on thenio4r
spec in the worker, it’s called on the same instance that was first created when the puma master activated the gem and returns the extension directory of the gem at the time that the puma master activated it. So, if you remove that directory after the puma master boots up, workers will still try to find files in that directory when they boot. If they’re not there,Bundler::Source::Rubygems#installed_specs
excludes the gem from its index andBundler::SpecSet#materialize
is, in turn, unable to find the gem and raisesGemNotFound
.How to fix it
We’re fortunate that the puma master doesn’t actually need to have a runtime dependency on
nio4r
. Basically, we can make it so thatpuma-wild
never activates thenio4r
gem in the master process at all. As long as we neverrequire 'nio'
in the master process (basically what @MSP-Greg was getting at), we’re good. I’ve tested that change in my repro repo (by patching puma version 4.3.1) and everything works. But if I apply the same patch to puma master (after #1893 was merged), I get a different error that’s preventing me from verifying the fix against master. I think that might be an unrelated regression introduced by #1893Opening a PR for now: #2125
#2374 is merged, which was a prerequisite for the changes I want to introduce to fix this issue.
I’ve got a branch that fixes this issue. I’m working on fixing my tests for it.
I created a repo that demonstrates that this error occurs reliably if you’re deleting old releases
https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors
The
master
branch of that repo gets the error described in this thread. It’s using puma 4.3.1.The
puma-master
branch of the repro repo is usingpuma
at commit 8c9b3ebc7a86bdc5488704a0225b76409b9f555f, which includes changes from #1893.Workers still fail on startup, but with a different error
I opened #2407 which is a robust solution to the original problem described in this issue. It makes it so that the puma master process can use totally separate gems from the puma worker processes, so it’s fine to remove
nio4r
’s compiled native extensions on disk and perform phased restarts. It also supports doing the same for any gem with native extensions loaded by the puma master process, such asffi
if you’re usingpuma_worker_killer
. The problem with that PR, though, is that it increases the overall memory footprint of a puma cluster since worker processes can’t leverage COW memory byfork
ing from the master process. We ultimately decided to close it.I opened #2427 which fixes the
nio4r
issue with regard to phased restarts. It does not allow one to remove from disk compiled native extensions for other gems that are loaded by the puma master process (likeffi
), but I suspect this doesn’t matter to most people. That PR will fix this problem for most folks.For anyone else, I opened rubygems/rubygems#4004 which I think is the root cause of the issue in Bundler. If that issue is resolved, then it will be possible to perform a phased restart after removing compiled native extensions for any gem loaded by the puma master process, including
ffi
.@cjlarose Thanks for the writeup. I’ll take a more detailed look this week.
Seen on CI
@dentarg I’m not certain about this either. The bug might not even be with Puma itself–it might be a bug with bundler, for example. It just so happens that puma phased restarts combined with a deployment strategy that removes old releases exposes the bug. It’s also a little frustrating, too, because if you were running puma 3.x (before the introduction of nio4r in 4.x), and you were removing old releases, that was probably fine since puma 3.x had no runtime dependencies on gems that had native extensions.
@gencer Puma doesn’t owe you anything. Maintainers are often unpaid volunteers. If you can do phased restarts using other web servers, then you should use those other web servers.
I think the problem is that some users have the same issue as Travis has, so ‘we’ need to determine what that is…
@cjlarose Neither puma nor its developers and also volunteers doesn’t owe me anything. I am deeply sorry if I offended anyone. I didn’t mean that way. You misunderstood me. I was just trying to say that it should be fixed on puma-gem (that’s what i believe because other web servers doesnt have this issue and puma 3.x also don’t have this.) not by puma developers. This could be of course “me” or someone who knows these things better than me.
Please accept my apologies on this matter.
Thanks @dentarg very interesting. I’ll try to find some time to try out #1893 and see. Reproducing the problem in itself was a challenge, and I was only seeing it in our staging environment (which is shared, so I can’t experiment there for too long). Hope I can still reproduce it, and then try again with #1893 and report back here.
Otherwise, is there a workaround perhaps? maybe moving away from
prune_bundler
? what’s the recommended approach for zero-downtime deploys with the now “old-school” symlinked structure?@gingerlime The bug might have been there since before Puma 4, see https://github.com/puma/puma/issues/1593 and https://github.com/bundler/bundler/issues/6667, looks like the same problem you have but under different circumstances
There’s a PR trying to fix it https://github.com/puma/puma/pull/1893, perhaps you could try it out?
Puma v4.3.0 is still using
Bundler.with_clean_env
https://github.com/puma/puma/blob/395337df4a3b27cc14eeab048016fb1ee85d2f83/lib/puma/launcher.rb#L267
https://github.com/puma/puma/pull/1893 switches to
Bundler.with_original_env
which reportedly solved the issue for one user in https://github.com/bundler/bundler/issues/6667Puma 3 had no dependent gems. Puma 4 added nio4r as a dependency. I’m not sure but maybe bundler tries to reload nio4r? Where as before it didn’t need to reload anything?
@alx75 I think your intuition is correct here. Since the
ffi
gem would not be activated in the puma master process in your case, then the phased restart should complete successfully.I was able to test this reliably here: https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors/tree/ffi
see https://github.com/puma/puma/issues/2471
https://github.com/puma/puma/issues/1875 tracks the progress of a new architecture in puma that’ll hopefully resolve this issue.
Everyone having simlar issues should try master branch, as https://github.com/puma/puma/pull/1893 was merged and may help.
@gingerlime I thinks this problem occurs because you vendor the ruby gems in /var/local/app/app.d/{version (int)}. See:
If this version gets deleted the vendored gems also get deleted. You probably want to install gems in a shared persistent directory like:
/var/local/app/app.d/shared
Makes sense if you look at the code for how prune_bundler works. We’ll have to figure out how to change/reload Puma’s spec.
One small thing I saw in our staging logs, but I don’t see in my toy environment is:
Early termination of worker
Maybe it’s related to the problem?