heroku-buildpack-ruby: Can't deploy without purging build cache

Reported to Heroku support as ticket 1092887

Since today we are not able to able to deploy our app without first clearing the build cache (https://devcenter.heroku.com/articles/slug-compiler#build-cache)

This was our setup when the day started:

=== Buildpack URLs
1. https://github.com/heroku/heroku-buildpack-github-netrc.git
2. https://github.com/heroku/heroku-buildpack-nodejs.git
3. https://github.com/heroku/heroku-buildpack-ruby.git

You can see that we ran using the master branch of the heroku/ruby buildpack, meaning we used all the latest changes: https://github.com/heroku/heroku-buildpack-ruby/commit/b7c8180bd9add46e179346f691ff1e339b87357c

It failed like this

remote: -----> Installing dependencies using bundler 2.2.33
remote:        Running: BUNDLE_WITHOUT='development:test' BUNDLE_PATH=vendor/bundle BUNDLE_BIN=vendor/bundle/bin BUNDLE_DEPLOYMENT=1 bundle install -j4
remote:        [248, #<Thread:0x00007f549f6e3c78 run>, #<NameError: uninitialized constant Gem::Source
remote:
remote:              (defined?(@source) && @source) || Gem::Source::Installed.new
remote:                                                   ^^^^^^^^

Which has been reported in https://github.com/heroku/heroku-buildpack-ruby/issues/1280 / https://github.com/rubygems/rubygems/issues/5351

We then tried many different things to get deploys working reliably again

  • changing ruby version (.ruby-version) from 3.1.1 to 3.1.0
  • updating bundler in Gemfile.lock from 2.3.7 to 2.3.10
  • deploying after doing heroku labs:enable build-in-app-dir for the app
  • using v237 of heroku/ruby buildpack
  • using the latest released version of heroku/ruby buildpack
  • switching stack to heroku-18 (works the first time because the build cache is purged)
  • switching stack back to heroku-20 (works the first time because the build cache is purged)
  • switching stack back to heroku-18 (works the first time because the build cache is purged)

We changed the buildpack setup to be (and did heroku labs:disable build-in-app-dir)

=== Buildpack URLs
1. https://github.com/heroku/heroku-buildpack-github-netrc.git
2. heroku/nodejs
3. heroku/ruby

and updated .ruby-version to contain 3.1.1 again

Deploys still fail with the uninitialized constant Gem::Source error (https://github.com/rubygems/rubygems/issues/5351) unless we purge the build cache before the deploy.

Yes, our Gemfile uses gems hosted in private git repos on GitHub.

How can we get out of this situation?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 17
  • Comments: 29 (7 by maintainers)

Commits related to this issue

Most upvoted comments

This change is released https://devcenter.heroku.com/changelog-items/2409 please try to deploy using the heroku/ruby buildpack now.

Even after purging the cache on Heroku, it still does not deploy.

That sounds like a different ticket. I recommend opening a Heroku support request.

I can conform that replacing the buildpack with https://github.com/heroku/heroku-buildpack-ruby/pull/1296 worked for us and we can deploy again.

Awesome, thanks for the verification. I’ve looked into this more and what seems to be happening is that the code that calls Gem::Source is found only in Bundler when bundler has found a file on disk. I.e. on the second deploy when the cache is populated.

While it seems like this problem suddenly manifested itself, my current theory is that it affects all Ruby 3.1.1 apps, but only on the second deploy. (Though maybe with some other set of complicating factors as 3.1.1 was released about 2 months ago and for sure people have deployed twice with it). WIth that theory I decided I could test it by trying v236 deploy of the buildpack https://github.com/heroku/heroku-buildpack-ruby#v236 it still had the same behavior.

With that theory blown, I hunted down logs on an app to see when it started failing. I saw Ruby 3.1.1 use for many deploys, and I saw lots of those deploys using the cache. However this stood out from the deploy right before it started failing:

       Installing net-protocol 0.1.3
       Using net-imap 0.2.3
       Using net-pop 0.1.1
       Using net-smtp 0.3.1
       Bundle complete! 31 Gemfile dependencies, 75 gems now installed.
       Gems in the groups 'development' and 'test' were not installed.
       Bundled gems are installed into `./vendor/bundle`
       Bundle completed (2.44s)
       Cleaning up the bundler cache.
       Removing bundler (2.2.33)

In all of the prior working deploys there were no changes to the Gemfile or installed gems (all gems in bundle install reported as “Using” instead of “Installing”). I checked this gem net-protocol and sure enough it was released on April first right when you all started seeing this problem.

I added this to my reproduction case:

$ cat Gemfile | tail -n 1
gem "net-protocol", "0.1.2"

Then did a bundle install locally and committed the results to git. I cleared the cache and deployed twice. The problem did not show up. Here’s the git diff:

$ git diff 3ea5a9f44834a61b3fbe10086ae7ae5edcc97d4b main
diff --git a/Gemfile b/Gemfile
index e473cff..8b87258 100644
--- a/Gemfile
+++ b/Gemfile
@@ -70,3 +70,5 @@ group :test do
   gem "selenium-webdriver"
   gem "webdrivers"
 end
+
+gem "net-protocol", "0.1.2"
diff --git a/Gemfile.lock b/Gemfile.lock
index 9c4f354..c630622 100644
--- a/Gemfile.lock
+++ b/Gemfile.lock
@@ -97,6 +97,7 @@ GEM
       actionpack (>= 6.0.0)
       railties (>= 6.0.0)
     io-console (0.5.11)
+    io-wait (0.2.1)
     irb (1.4.1)
       reline (>= 0.3.0)
     jbuilder (2.11.5)
@@ -121,7 +122,8 @@ GEM
       digest
       net-protocol
       timeout
-    net-protocol (0.1.3)
+    net-protocol (0.1.2)
+      io-wait
       timeout
     net-smtp (0.3.1)
       digest
@@ -220,6 +222,7 @@ DEPENDENCIES
   debug
   importmap-rails
   jbuilder
+  net-protocol (= 0.1.2)
   pg (~> 1.1)
   puma (~> 5.0)
   rails (~> 7.0.2, >= 7.0.2.2)

Problem explained

The stdlib net-http was moved to be a “default gem” some time ago and it depends on this other gem net-protocol https://rubygems.org/gems/net-http. Ruby ships with a default version of net-http which therefore also comes with a default version of net-protocol. What I think happened is when net-protocol released a new version on April first bundler saw that a more recent version of net-protocol was available it downloaded it or perhaps developers got a dependabot update. What happens after that is: the next deploy works, it installs this gem. But once the gem is installed, on the next bundle or ruby invocation it’s source code will be used instead of the default gem.

So the bug is in net-protocol? Not quite. I dug into the diff of the releases and it’s pretty small https://github.com/ruby/net-protocol/compare/v0.1.2...v0.1.3

My best guess without further digging is that the bug is in an interplay between rubygems versions and bundler versions. There’s a support matrix gap. Not every rubygems version works with every bundler version and vice-versa. Current theory: There’s some special case code in Bundler that kicks in when a default gem is replaced by a downloaded gem that exercises Gem::Source and that code in bundler does not play well with the version of bundler that ships with 3.1.1

It’s important to note that even though we say “using bundler 2.2.x” in your Heroku output, due to the way bundler works, a vendored version of bundler that is higher than our version will be used. That’s one of the reasons why (i suspect) some of you report upgrading bundler (via my patch/branch working) as either it has a bugfix for this problem or the bundler version that ships with 3.1.1 had a bug in it.

Next steps

I’m working to roll out the new version of Bundler. I will post back here after I deploy the fix. However it might cause other instability for other customers (as commonly happens with changes in Bundler versions across the platform). I’ll monitor it and report back here if I end up having to roll it back.

Thanks all for the info from all of y’all, turns out it was April Fools related, just not in the way we previously had guessed.

The most curious part of this for me is the timing. I got a flurry of tickets about this and I"m looking into it. However, I can’t figure out why it was working before.

We’ve not changed anything on our end on Friday. We did merge some stuff 7 days ago but that doesn’t coincide with this starting to happen. https://github.com/heroku/heroku-buildpack-ruby/commits/main. Also @dentarg mentions that using the older version of the buildpack didn’t fix the issue.

I’m pretty unimpressed with EST-only support hours for essential infrastructure.

I’m aware of this limitation. As a result, I don’t push changes in the evening or on Fridays (basically unless I can be in front of my computer for the next several hours). I would love to provide 24/7 support, but there’s only one of me (currently). We do have many tier 1 supporters, but only ~one buildpack owner per language which would be required for such an in-depth investigation.

Under normal circumstances the ability to rollback your own deploys heroku rollback and to pin to an older release of the buildpack https://devcenter.heroku.com/articles/bundler-version#using-an-older-version-of-bundler are enough of an escape valve to buy a day or two.

In this case either something external triggered this event that Heroku can’t control or many people started doing something different at about the same time. I’m glad that yall found one another and were able to come up with a workaround in the short term. Even though I’m paid to work on this OSS I can’t do it all and rely on “many eyes” for help etc. Sorry you’ve had to wait.

You shouldn’t need to downgrade your Ruby version. You can bundle with an older version of bundler. Just install an older version (Heroku uses 2.2.33) and then bundle specifying that version:

We (heroku) don’t invoke the command this way, we use bundle install which goes through the rubygems lookup process which picks the latest version that satisfies. When Ruby 3.1.1 the Bundler version is later than the one we’ve put on the platform for you. That’s an interesting idea. In the CNB re-write I was going the other direction with the ability to use any arbitrary bundler version that you’ve got in your Gemfile.lock. That’s a tangent though.

I’m going to keep debugging this and looking into the root cause. Also since this seems to be bundler/rubygems related one lever I’ve got is to be able to release a newer bundler version. I wanted to do it before and had to roll back. I’m hoping those issues have been ironed out. I’ll go ahead and stage that on a branch and ask anyone who has this issue to try it out (and comment out your temp bin/bundle fix).

In the mean time if you’ve got any other theories about what changed or why this suddenly started happening I’m all ears.

Added the following to one of my project’s bin/bundle stub, as mentioned by @sebnjk here.

begin
  require "rubygems/source"
rescue
  # Do nothing
end

This has at least allowed me to do multiple deploys without having to purge the build cache each time. Not sure what the implications are for doing this though.

Got some race conditions in that comment submission. I just updated my comment with a LOT of detail, check it out. I’ll be heads down deploying my patch for a bit. I’ll let you all know when it’s done.

Huge thanks for sharing this! I was down a completely wrong rathole trying to fix this on a dokku deployment.

For anybody else looking for a stopgap with dokku, this fixed the issue blocking our dokku 0.27.0 deployments today (thanks @dentarg).

dokku repo:purge-cache <app>

After running that we were able to push successfully.