rails: Rails development mode memory leak and slowdown, also on new apps

TL;DR; My decently sized Rails app (173 models, 208 controllers) is suffering from a continuously increasing slowdown in development mode that makes it hard to work on it. Each reload of the code takes longer than the previous one, with delays increasing to minutes after just a few reloads. I have identified a probable cause and it’s also present on new apps (rails new leakyapp, no added gems). I haven’t found a solution yet.

The issue: path and url helpers are packaged into a module and included in controllers and other classes. When you reload! an app, the helper methods in this module are removed and then re-added. While doing this, ruby needs to remove the methods from each class that includes the module. However each reload of the code leaves the old class instances (e.g. the controller from the previous request, etc) in memory and thus ruby needs to clean route helpers from an ever increasing amount of classes. This gets slow after a while.

Background

Hello,

For the past years I’ve been coping with an issue that Rails feels extremely slow when developing. I start the server, wait for it to boot up and open the site. It takes a minute or two before it loads.

Then I have to make a change to a controller, model or other .rb file and hit reload in the browser. It takes some seconds and the changed page opens just fine. However after each following change, the reload takes longer and longer. After a few rounds, it’s faster to restart the server than to wait for Rails to reload the code. I’m often waiting minutes before anything shows up again.

At first I thought the issue has something to do with activeadmin, a properly slow gem in its own right. Since it always takes forever to reload if I make any change in the admin area, I thought the increasing slowness was just part of the package and I tolerated it for longer than I should have. Lately though it got so unbearable that I needed to find a solution.

So I started profiling. Opening on page in admin from a cold boot, I get this beautiful firegraph:

leak1

(using rack-mini-profiler with ?pp=flamegraph&flamegraph_sample_rate=20).

I guess this is a relatively normal startup. I need to see if I can somehow cache sprockets and check why is Thor still there, but other than that the app loads nicely.

Let’s hit reload:

leak2

The graph follows the same shape as the previous one, just the part with sprockets is much smaller.

So far so good. Now let’s make a change in a .rb file and then hit reload:

leak3

It takes a while for the page to load. Well over 30 seconds. That’s because ActiveAdmin needs to re-load all 100 or so pages just to draw a menu and generate all the paths. Oh well. However that’s not the issue here.

The issue is that this part starts to get longer:

leak4

… and longer:

leak5

… and even longer:

leak6

… and after about 6 reloads (I skipped a few images), it’s half the size of the ActiveAdmin reload time…

leak7

After 10+ reloads I will be waiting minutes before anything opens. That’s enough time to check HN and forget what you were doing.

So who is this culprit? It’s this guy: ActionDispatch::Routing::RouteSet::NamedRouteCollection#clear!

The full stacktrace is here.

        def clear!
          @path_helpers.each do |helper|
            @path_helpers_module.send :remove_method, helper
          end

          @url_helpers.each do |helper|
            @url_helpers_module.send  :remove_method, helper
          end

          @routes.clear
          @path_helpers.clear
          @url_helpers.clear
        end

Huh. It takes an increasingly longer time to clear path helpers?

As far as I know, path and url helpers are modules, which are automatically included in all controllers and views. To get *_path methods into other classes (e.g. models), one should add include Rails.application.routes.url_helpers to the class.

So it’s taking longer and longer to remove the *_path and *_url methods from @path_helpers_module.

Does this mean that this module is included in more and more classes, which don’t get garbage collected? Let’s find out.

Throwing caution to the wind (don’t do this at home), I opened and edited the route_set.rb file in the gem’s folder and added some debug code.

First just to measure what gets slower (it’s the @*_helpers.each blocks, not the .clear calls)

!! @path_helpers.remove_method(@url_helpers.each) in 0.1322 sec
!! @url_helpers_module.remove_method(@url_helpers.each) in 0.1383 sec
!! @routes.clear in 0.0 sec
!! @path_helpers.clear in 0.0 sec
!! @url_helpers.clear in 0.0 sec

# next times:
!! @path_helpers.remove_method(@url_helpers.each) in 0.2533 sec
!! @path_helpers.remove_method(@url_helpers.each) in 0.3978 sec
... etc

Then to measure if there are just more @path_helpers and @url_helpers. Nope, their number remains constant:

!! @path_helpers.count == 822
!! @path_helpers.remove_method(@url_helpers.each) in 0.1371 sec

!! @path_helpers.count == 822
!! @path_helpers.remove_method(@url_helpers.each) in 0.2666 sec

So it must be that the module is included in an ever increasing number of classes:

ObjectSpace.each_object(Object).select{|o| o.class.ancestors.include?(@url_helpers_module) }.count

=> 275 # first time
=> 474 # second...
=> 602

Bingo!

It appears that on each reload, Rails reloads new versions all the classes, yet doesn’t dispose of the old versions.

Who are all these classes?

 > ObjectSpace.each_object(Object).select{|o| o.class.ancestors.include?(@url_helpers_module) }.map { |o| o.class.to_s }.sort.join(' ')
=> "#<Class:0x00007fda669591b8> ... cut for brevity ... #<Class:0x00007fda85692668> #<Class:0x00007fda85711f08> #<Class:0x00007fda857b5d88> Admin::RequestsController Admin::RequestsController"

Every reload adds a bunch of classes that look like this: #<#<Class:0x00007fda668c8d48>:0x00007fda668c3690 @_routes=#<ActionDispatch::Routing::RouteSet:0x00007fda8015ad08>> and one controller of the page I’m accessing (Admin::RequestsController).

Digging around, it seems that those classes are instances of this proxy class that’s created when you call Rails.application.routes.url_helpers.

So it seems something still holds a reference to the controllers from previous requests… and all those proxy url helper modules that are included in so many places.

I installed the ruby-mass gem in the hopes of figuring it out. Unfortunately I didn’t get far. I couldn’t find if there was any gem that for whichever reason kept references to the old classes.

So I decided to see if I could replicate the issue with a smaller app. I didn’t have to look far.

Just a simple rails new bla, rails g model something, rake db:migrate, rails g controller something was enough to replicate this.

Hacking route_set.rb and adding these lines:

 count = ObjectSpace.each_object(Object).select{|o| o.class.ancestors.include?(@url_helpers_module) }.count
 classes = ObjectSpace.each_object(Object).select{|o| o.class.ancestors.include?(@url_helpers_module) }.map { |o| o.class.to_s }.sort.select { |i| !i.include?('#') }
 puts count
 p classes

gives the following ouptut, after making a small change in something_controller.rb and hitting reload:

# no reload
1
[]
!! @url_helpers.count == 12
!! @url_helpers_module.remove_method(@url_helpers.each) in 0.0001 sec

# after the first reload
5
["SomethingController"]
!! @url_helpers.count == 12
!! @url_helpers_module.remove_method(@url_helpers.each) in 0.0 sec

# second reload
9
["SomethingController", "SomethingController"]
!! @url_helpers.count == 12
!! @url_helpers_module.remove_method(@url_helpers.each) in 0.0001 sec

# third reload
13
["SomethingController", "SomethingController", "SomethingController"]
!! @url_helpers.count == 12
!! @url_helpers_module.remove_method(@url_helpers.each) in 0.0001 sec

# fourth reload
17
["SomethingController", "SomethingController", "SomethingController", "SomethingController"]
!! @url_helpers.count == 12
!! @url_helpers_module.remove_method(@url_helpers.each) in 0.0001 sec

The number of classes, from which the path helpers need to be cleared from, keeps increasing by 4 per reload. In an app as small as this, you don’t notice it. However if it keeps increasing by an extra 200 classes each reload, it becomes a problem.

Let’s select all the SomethingControllers and see who has references to them:

cs = ObjectSpace.each_object(Object).select{ |o| o.class.ancestors.include?(@url_helpers_module) }.select { |o| !o.class.to_s.include?('#') }
cs.map(&:to_s)
=> ["#<SomethingController:0x00007f93d15ad630>", "#<SomethingController:0x00007f93d15d65d0>", "#<SomethingController:0x00007f93d160e520>"]

ids = cs.map { |c| Mass.references(c) }.map{ |k| k.keys }.flatten.select { |k| k !~ /Pry/ }.map{ |k| k.split('#').last.to_i }
=> ["ActionDispatch::Request#70136424506820",
 "ActionDispatch::Request#70136424506980",
 "ActionDispatch::Request#70136424869360",
 "ActionDispatch::Request#70136424572740",
 "ActionDispatch::Request#70136424572900",
 "ActionDispatch::Request#70136430985940",
 "ActionDispatch::Request#70136424704460",
 "ActionDispatch::Request#70136424704620",
 "ActionDispatch::Request#70136424801680"]

The SomethingControllers are only referenced by some ActionDispatch::Requests that are hanging around.

Looking at who is referencing those requests:

ids.map { |id| [id,Mass.references(Mass[id])] }.to_h
=> {70136424506820=>{"SomethingController#70136424655640"=>["@_request"], "ActionDispatch::Response#70136424655900"=>["@request"], "ActionDispatch::Http::Headers#70136424674980"=>["@req"]},
 70136424506980=>{"ActionDispatch::Request::Session#70136424506960"=>["@req"], "ActionDispatch::Cookies::CookieJar#70136424672740"=>["@request"]},
 70136424869360=>{"ActionDispatch::RemoteIp::GetIp#70136424869340"=>["@req"]},
 
 70136424572740=>{"ActionDispatch::Http::Headers#70136424736180"=>["@req"], "SomethingController#70136424739560"=>["@_request"], "ActionDispatch::Response#70136424739820"=>["@request"]},
 70136424572900=>{"ActionDispatch::Request::Session#70136424572880"=>["@req"], "ActionDispatch::Cookies::CookieJar#70136424758900"=>["@request"]},
 70136430985940=>{"ActionDispatch::RemoteIp::GetIp#70136430985920"=>["@req"]},
 
 70136424704460=>{"ActionDispatch::Http::Headers#70136424850800"=>["@req"], "SomethingController#70136424854160"=>["@_request"], "ActionDispatch::Response#70136424854420"=>["@request"]},
 70136424704620=>{"ActionDispatch::Request::Session#70136424704600"=>["@req"], "ActionDispatch::Cookies::CookieJar#70136424873540"=>["@request"]},
 70136424801680=>{"ActionDispatch::RemoteIp::GetIp#70136424801660"=>["@req"]}}

I investigated all of them in depth (via Mass.references) and couldn’t find any top node that held a reference to any object from all the 3 different requests. The objects from each request however seem to have slightly cyclical relationships (e.g. a request holds a session, a session holds an env, the env holds the session and the request… or something like that). I’m not sure if this changes anything - they could and should be removed in bulk after a reload.

Starting the GC manually does nothing.

And now I’m stuck.

Does anyone have an idea what is happening, if it’s perhaps a feature not a bug, and how can I fix this so that my reload times go back to around 30 seconds? 😄

Thanks for reading! 🍻

Steps to reproduce

  1. Create a new Rails app. Add a few models and controllers.
  2. Add this to any controller:
def index
  bla = 2
  render plain: ObjectSpace.each_object(Object).select{|o| o.class.ancestors.to_s.include?('ApplicationController') }.map(&:to_s).inspect
end
  1. rails server and open the endpoint from step 2
  2. Make any change in any “.rb” code and reload. E.g. increment the bla counter.
  3. See the number of loaded instances of the controller go up and up.
  4. Wonder what can be done about it?

Expected behavior

Rails should just keep the latest instance of the controller class in memory, not all of them.

Actual behavior

Every reload of the code adds a new controller instance in memory and cleaning url helper methods from all these old controllers takes longer after each reload.

System configuration

Rails version: 5.2.0

Ruby version: 2.5.0

Disclaimer

When I say “Rails does this” or “Rails does that” above, I’m using “Rails” as a shorthand for Object. The issue could easily come from any included gem, e.g. puma or I might be doing something wrong myself.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 78
  • Comments: 24 (8 by maintainers)

Commits related to this issue

Most upvoted comments

I’ve been investigating a similar issue in one large application I am working on. I tried to look into it a couple of times before, but never had enough time. This ticket has been an incredible source of ideas to further my research and I took last weekend to go deeper into it.

The issue was quite severe in my project. It can be easily replicated by calling reload! in the console. A single call would begin at approximately 6 seconds the first time and take more than a minute after 10 reloads. After some benchmarks, most of the slowness was caused by creating/clearing routes during reloading, even though the number of routes/helpers remained stable. The memory footprint would grow linearly with each reload.

I’ve used a very simple brute force approach inspired by http://www.be9.io/2015/09/21/memory-leak/, that is:

  • setup ObjectSpace with tracing: require 'objspace'; ObjectSpace.trace_object_allocations_start
  • generate (sorted) dumps interleaved with reloads: 3.times {|n| reload!; GC.start; sleep 1; f="dump#{n}"; File.open(f, 'w'){|io| ObjectSpace.dump_all(output: io)}; %x(sort #{f} > #{f}s)}
  • get what has potentially leaked using the command line: comm -13 dump0s dump1s > dump01 && comm -12 dump01 dump2s > leak

From there, I would look into the resulting file for anything out of the ordinary, especially duplicated class names. I wasted a lot of time at first, but I ended up getting better at spotting the culprits. Once I had an interesting memory address, I used @tenderlove’s script (slightly modified) from a previous comment to look at the memory tree to see what was happening. Here’s some zoomed in screenshots of what I found:

leak1

The one above show a leaked class from translation models created by Globalize.

leak2

That one is an example of a leak coming from models using HABTM association.

There was also another leak originating from ActiveAdmin controller classes. The commonality between all these leaks was various anonymous classes referenced by the ActiveSupport::DescendantsTracker module.

After looking into the module code and searching for known issues, I’ve found there was an issue about this problem #31395 and a pull request #31442. I then proceeded to extract the fixed DescendantsTracker module and monkey patch it in development and voilà, no more leak!

I’ll see if I can help with #31442 in the coming weeks.

Hi everyone! As the author of the issue I just want to report that ever since we upgraded to Rails 6 and started using the Zeitwerk loader, these reload slowdowns have vanished. So the issue is solved for me! Others in my team also report similar results.

Thus I’m closing the issue! 🎉🎉🎉

(Note: I didn’t test Rails 6 without the Zeitwerk loader, so I can’t say if that configuration would also fix the issue or not… Probably though because AFAIK the dependents tracker patch that’s linked above made it into the core of Rails and not Zeitwerk explicitly…)

We run into that issue too:

First reload: screen shot 2018-09-14 at 16 03 54 After 10 or so code changes and reloads: screen shot 2018-09-14 at 16 07 58

Is there someone who has tried the one-liner above and is not seeing this buildup of classes?

  • We tried and yes, there are a buildup of classes similar to @mariusandra

I tried and could go up to 8 instances of my controller in memory before breaking down from the slowness of it all. No GC happened.

  • In our case, we have 20+ instances of the same controller in memory and after that, they’ve got GC’ed.

@kaspth thanks for confirming you’re also seeing this!

It’s been a month and there are no other replies. So I would like to ask whoever is still reading this: do you have the same problem?

It’s easy to test in two ways:

  1. When you make changes in a .rb file and your app reloads in development mode, does the memory usage keep going up forever?
  2. If you put …
  puts "!! " + ObjectSpace.each_object(Object).select{|o| o.class.ancestors.to_s.include?('ApplicationController') }.map(&:to_s).inspect

… into any controller, add a “!” and reload, does the list of things printed get longer and longer?

It would be great to know if this a) is a non-issue, b) is a standard issue that affects everyone, is usually not noticeable, but is made much much worse by unique my selection of gems and models… or c) my PC is just haunted 👻 😃

Thanks!

Thanks so much, this is exceptionally well researched and explained! We’re seeing this in our 4.2 app as well. I’ve never had the time to dig into out, but I’m really glad you did 😊

@pixeltrix would you happen to have any ideas about how to untie the kept references after the routes module unload?

@cmthakur You can use the patched DescendantsTracker by @ebeigarts here: https://github.com/rails/rails/blob/ea8f8ed7ac91bc397a6ac6cf2faad36e4a9a0616/activesupport/lib/active_support/descendants_tracker.rb

There’s some actions in the issue #31442 it might get merged soon.

I need to stop working on this today, but here is what I’ve found so far.

  1. You don’t need to modify the controller to see the “leaked” objects. Just making a request is enough.
  2. Some of these objects are clearly being held by the vm stack, others look like they are somehow being held by a reference from the worker thread.

The test app I linked to earlier can dump the heap if you pass the right parameter. I’ve just been curling a few times, then dumping the heap, then looking at the references. The following program will output a dot file that only contains the part of the heap that can reach a particular object:

require 'json'
require 'set'
require 'cgi'

class Slot
  attr_reader :info, :address_to_slot

  def initialize info, address_to_slot
    @info            = info
    @address_to_slot = address_to_slot
    @can_reach       = {}
  end

  def references
    @references ||= (info[__method__.to_s] || []).uniq.map { |a|
      address_to_slot[a]
    }.compact
  end

  def address; info[__method__.to_s]; end
  def root; info[__method__.to_s]; end
  def root?; info["type"] == "ROOT"; end

  def reachable? addr, seen = Set.new
    return if seen.include?(address)

    seen << address

    @can_reach.fetch(addr) do
      reachable = address == addr || references.any? { |ref|
        ref.reachable?(addr, seen)
      }
      @can_reach[addr] = reachable
    end
  end

  def marked?; @can_reach.values.any?; end
  def id;      root? ? root : address; end

  def label
    attributes = %w{ type file line }.map { |key| info[key] }.compact.map { |k|
      CGI.escape_html(k.to_s)
    }.join " | "
    extras = ''
    if @can_reach[address]
      extras = "style=bold fontsize=30 fontcolor=red color=red"
    end
    "#{id.inspect} [label=\"{ #{id} | #{attributes} }\"#{extras}];"
  end
end

address_to_slot = {}
roots           = []
all_nodes       = []

File.open(ARGV.shift, 'r') do |f|
  f.each_line do |line|
    info = JSON.parse line
    slot = Slot.new info, address_to_slot
    all_nodes << slot
    if slot.root?
      roots << slot
    else
      address_to_slot[slot.address] = slot
    end
  end
end

ARGV.each do |needle|
  needle = needle.gsub(/x[0]+/, 'x')

  roots.each do |root|
    root.reachable? needle
  end
end

puts "digraph g {"
puts "node [shape=record];"

all_nodes.each do |node|
  next unless node.marked?
  puts node.label
  node.references.each do |ref|
    next unless ref.marked?
    puts "#{node.id.inspect} -> #{ref.id.inspect}"
  end
end

puts "}"

I’ve been using this program to find what references the “leaked” objects. I just give it the heap JSON file along with the address of one of the leaked objects returned from the controller. Some of the references are pretty straight forward. The red node is the “leaked” object. This one is clearly just being held by the stack:

Stack Held

Some of these are not. This one is more complicated:

out2

The thing that’s confusing me about the second one is that there is a thread that’s holding a direct reference to the “env” hash and it has a reference to the object:

out2 dot 2018-09-20 18-12-07

I am having a hard time figuring out how it has this direct reference. It might still be a stack for that thread, but I would have expected that to be part of the machine context roots. I will look more tomorrow.

Also, the heap dump code uses the object’s mark function to determine references. The mark function for threads can be found here.

Also, each node in the graph shows the address, type, allocation file, and allocation line for that object. Unfortunately I don’t know why Threads are keeping a reference to that hash. 😞