heroku-buildpack-ruby: Can't deploy without purging build cache
Reported to Heroku support as ticket 1092887
Since today we are not able to able to deploy our app without first clearing the build cache (https://devcenter.heroku.com/articles/slug-compiler#build-cache)
This was our setup when the day started:
=== Buildpack URLs
1. https://github.com/heroku/heroku-buildpack-github-netrc.git
2. https://github.com/heroku/heroku-buildpack-nodejs.git
3. https://github.com/heroku/heroku-buildpack-ruby.git
You can see that we ran using the master branch of the heroku/ruby buildpack, meaning we used all the latest changes: https://github.com/heroku/heroku-buildpack-ruby/commit/b7c8180bd9add46e179346f691ff1e339b87357c
It failed like this
remote: -----> Installing dependencies using bundler 2.2.33
remote: Running: BUNDLE_WITHOUT='development:test' BUNDLE_PATH=vendor/bundle BUNDLE_BIN=vendor/bundle/bin BUNDLE_DEPLOYMENT=1 bundle install -j4
remote: [248, #<Thread:0x00007f549f6e3c78 run>, #<NameError: uninitialized constant Gem::Source
remote:
remote: (defined?(@source) && @source) || Gem::Source::Installed.new
remote: ^^^^^^^^
Which has been reported in https://github.com/heroku/heroku-buildpack-ruby/issues/1280 / https://github.com/rubygems/rubygems/issues/5351
We then tried many different things to get deploys working reliably again
- changing ruby version (
.ruby-version) from 3.1.1 to 3.1.0 - updating bundler in
Gemfile.lockfrom 2.3.7 to 2.3.10 - deploying after doing
heroku labs:enable build-in-app-dirfor the app - using v237 of heroku/ruby buildpack
- using the latest released version of heroku/ruby buildpack
- switching stack to heroku-18 (works the first time because the build cache is purged)
- switching stack back to heroku-20 (works the first time because the build cache is purged)
- switching stack back to heroku-18 (works the first time because the build cache is purged)
We changed the buildpack setup to be (and did heroku labs:disable build-in-app-dir)
=== Buildpack URLs
1. https://github.com/heroku/heroku-buildpack-github-netrc.git
2. heroku/nodejs
3. heroku/ruby
and updated .ruby-version to contain 3.1.1 again
Deploys still fail with the uninitialized constant Gem::Source error (https://github.com/rubygems/rubygems/issues/5351) unless we purge the build cache before the deploy.
Yes, our Gemfile uses gems hosted in private git repos on GitHub.
How can we get out of this situation?
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 17
- Comments: 29 (7 by maintainers)
Commits related to this issue
- Check heroku cache clear fix https://github.com/heroku/heroku-buildpack-ruby/issues/1294 — committed to Henryvw/stoic_penknife by Henryvw 2 years ago
- Purge heroku cache https://github.com/heroku/heroku-buildpack-ruby/issues/1294 — committed to Henryvw/stoic_penknife by Henryvw 2 years ago
- Check heroku cache clear fix https://github.com/heroku/heroku-buildpack-ruby/issues/1294 — committed to starter69/stoic by jinxiao69 2 years ago
- Purge heroku cache https://github.com/heroku/heroku-buildpack-ruby/issues/1294 — committed to starter69/stoic by jinxiao69 2 years ago
This change is released https://devcenter.heroku.com/changelog-items/2409 please try to deploy using the
heroku/rubybuildpack now.That sounds like a different ticket. I recommend opening a Heroku support request.
Awesome, thanks for the verification. I’ve looked into this more and what seems to be happening is that the code that calls Gem::Source is found only in Bundler when bundler has found a file on disk. I.e. on the second deploy when the cache is populated.
While it seems like this problem suddenly manifested itself, my current theory is that it affects all Ruby 3.1.1 apps, but only on the second deploy. (Though maybe with some other set of complicating factors as 3.1.1 was released about 2 months ago and for sure people have deployed twice with it). WIth that theory I decided I could test it by trying v236 deploy of the buildpack
https://github.com/heroku/heroku-buildpack-ruby#v236it still had the same behavior.With that theory blown, I hunted down logs on an app to see when it started failing. I saw Ruby 3.1.1 use for many deploys, and I saw lots of those deploys using the cache. However this stood out from the deploy right before it started failing:
In all of the prior working deploys there were no changes to the Gemfile or installed gems (all gems in
bundle installreported as “Using” instead of “Installing”). I checked this gemnet-protocoland sure enough it was released on April first right when you all started seeing this problem.I added this to my reproduction case:
Then did a
bundle installlocally and committed the results to git. I cleared the cache and deployed twice. The problem did not show up. Here’s the git diff:Problem explained
The stdlib
net-httpwas moved to be a “default gem” some time ago and it depends on this other gemnet-protocolhttps://rubygems.org/gems/net-http. Ruby ships with a default version ofnet-httpwhich therefore also comes with a default version ofnet-protocol. What I think happened is whennet-protocolreleased a new version on April first bundler saw that a more recent version ofnet-protocolwas available it downloaded it or perhaps developers got a dependabot update. What happens after that is: the next deploy works, it installs this gem. But once the gem is installed, on the nextbundleorrubyinvocation it’s source code will be used instead of the default gem.So the bug is in
net-protocol? Not quite. I dug into the diff of the releases and it’s pretty small https://github.com/ruby/net-protocol/compare/v0.1.2...v0.1.3My best guess without further digging is that the bug is in an interplay between rubygems versions and bundler versions. There’s a support matrix gap. Not every rubygems version works with every bundler version and vice-versa. Current theory: There’s some special case code in Bundler that kicks in when a default gem is replaced by a downloaded gem that exercises Gem::Source and that code in bundler does not play well with the version of bundler that ships with 3.1.1
It’s important to note that even though we say “using bundler 2.2.x” in your Heroku output, due to the way bundler works, a vendored version of bundler that is higher than our version will be used. That’s one of the reasons why (i suspect) some of you report upgrading bundler (via my patch/branch working) as either it has a bugfix for this problem or the bundler version that ships with 3.1.1 had a bug in it.
Next steps
I’m working to roll out the new version of Bundler. I will post back here after I deploy the fix. However it might cause other instability for other customers (as commonly happens with changes in Bundler versions across the platform). I’ll monitor it and report back here if I end up having to roll it back.
Thanks all for the info from all of y’all, turns out it was April Fools related, just not in the way we previously had guessed.
The most curious part of this for me is the timing. I got a flurry of tickets about this and I"m looking into it. However, I can’t figure out why it was working before.
We’ve not changed anything on our end on Friday. We did merge some stuff 7 days ago but that doesn’t coincide with this starting to happen. https://github.com/heroku/heroku-buildpack-ruby/commits/main. Also @dentarg mentions that using the older version of the buildpack didn’t fix the issue.
I’m aware of this limitation. As a result, I don’t push changes in the evening or on Fridays (basically unless I can be in front of my computer for the next several hours). I would love to provide 24/7 support, but there’s only one of me (currently). We do have many tier 1 supporters, but only ~one buildpack owner per language which would be required for such an in-depth investigation.
Under normal circumstances the ability to rollback your own deploys
heroku rollbackand to pin to an older release of the buildpack https://devcenter.heroku.com/articles/bundler-version#using-an-older-version-of-bundler are enough of an escape valve to buy a day or two.In this case either something external triggered this event that Heroku can’t control or many people started doing something different at about the same time. I’m glad that yall found one another and were able to come up with a workaround in the short term. Even though I’m paid to work on this OSS I can’t do it all and rely on “many eyes” for help etc. Sorry you’ve had to wait.
We (heroku) don’t invoke the command this way, we use
bundle installwhich goes through the rubygems lookup process which picks the latest version that satisfies. When Ruby 3.1.1 the Bundler version is later than the one we’ve put on the platform for you. That’s an interesting idea. In the CNB re-write I was going the other direction with the ability to use any arbitrary bundler version that you’ve got in your Gemfile.lock. That’s a tangent though.I’m going to keep debugging this and looking into the root cause. Also since this seems to be bundler/rubygems related one lever I’ve got is to be able to release a newer bundler version. I wanted to do it before and had to roll back. I’m hoping those issues have been ironed out. I’ll go ahead and stage that on a branch and ask anyone who has this issue to try it out (and comment out your temp
bin/bundlefix).In the mean time if you’ve got any other theories about what changed or why this suddenly started happening I’m all ears.
Added the following to one of my project’s
bin/bundlestub, as mentioned by @sebnjk here.This has at least allowed me to do multiple deploys without having to purge the build cache each time. Not sure what the implications are for doing this though.
Got some race conditions in that comment submission. I just updated my comment with a LOT of detail, check it out. I’ll be heads down deploying my patch for a bit. I’ll let you all know when it’s done.
Huge thanks for sharing this! I was down a completely wrong rathole trying to fix this on a dokku deployment.
For anybody else looking for a stopgap with dokku, this fixed the issue blocking our dokku 0.27.0 deployments today (thanks @dentarg).
After running that we were able to push successfully.