rails: Slow performance when rendering collections of nested partials

Steps to reproduce

  • To start, you need two partials (_a.html.erb and _b.html.erb)
  • Partial _a.html.erb should contain a render call for partial _b.html.erb
  • Render partial _a.html.erb in a collection

Expected behavior

Ideally, the time to render should be on par with the aggregate time to render both _a and _b if _b was not nested inside of _a.

Actual behavior

I see about a 10x difference in render time between a nested example and a flat example. I’ve done some preliminary digging with stackprof. Keep in mind that I am not well versed in the internals of ActionView rendering so my understanding of what is not working may not line up with reality. It seems like whatever optimization is done that affords such a significant speed difference between these two cases:

Fast

<%= render partial: "a", collection: 100.times.to_a %>

Slow

<% 100.times do %>
  <%= render partial: "a" %>
<% end %>

…is not able to be applied to the partial that is nested inside of the outer partial. Based on my stackprof digging it looks like some of that optimization is happening in the conversion of the partial string ("a") into an actual path to a view file. In the fast case above, that conversion of "a" into "#{Rails.root}/app/views/application/_a.html.erb" happens one time, and in the slow case it happens 100 times. The same is true of the nested partials situation: the outer partial gets its view path calculated/looked-up one time, but the inner render call for the nested partial _b gets looked-up on each iteration.

I had some back and forth on Twitter with @tenderlove about this, and his intuition from the best description I could muster in 270 characters was that this was a bug and shouldn’t require a performance hit. But it is entirely possible I wasn’t explaining it well enough to him over the limited bandwidth communication channel of Twitter 😄

Tweet

https://twitter.com/tenderlove/status/1351223079805083648

Reproduction Repo

https://github.com/willcosgrove/partial-perf-repro

Server log images

Flat case (fast): Flat

Nested case (slow): slow_1 slow_2

I should also clarify about the logs: The default view logging in development definitely adds some additional overhead to the slow case, because it logs a line of output for each iteration. But even with view logging turned off completely, there is still a significant performance difference between the two cases, so I don’t believe the difference falls entirely at the feet of ActionView logging instrumentation.

Flamegraph:

Fast: fast_flame

Slow: slow_flame

System configuration

Rails version: Seems to be any version, but I tested specifically on 6.1.2.1 and main.

Ruby version: Also seems to be any version but I tested on both Ruby 2.7 and Ruby 3.0.

CC: @tenderlove

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 5
  • Comments: 16 (14 by maintainers)

Commits related to this issue

Most upvoted comments

I added an example using View Component, just to see how it compares in general.

Here’s my PR with the additions: https://github.com/willcosgrove/partial-perf-repro/pull/2

Something to note: My PR increases the nesting, to where there are three layers of partials/components being used.

A page renders 10k tweets, each tweet rendering a button, each button rendering button text. (button text is a stretch of an example - as it’s literally only passing content into a partial/component and returning it, but I wanted to have another layer to emphasize speed differences).

Flat:

60ms | ~57k allocations

image


Nested Partials:

3015ms | ~1.74M allocations

image


Methods, using Tag Helpers:

1201ms | ~2.2M allocations

image


View Components:

558ms | ~487k allocations

image


Honestly, I’m quite surprised at the performance of ViewComponent here, as they are effectively just using erb partials under the hood.

Again, this example is extreme to emphasize what’s going on (10k items, 3 layers of nesting), but I think it really highlights some inherent performance constraints of using partials (or even tag helpers).

@rafaelfranca Thanks for chiming in!

I added a PR swapping slim for erb - the same issue persists:

https://github.com/willcosgrove/partial-perf-repro/pull/1

Based on @jhawthorn’s comments, I wanted to pair down the benchmark. Observations:

  • Looping over render ViewComponent is about the same as render :collection
  • Logging is the major performance slowdown for nested partials. Causes ~1.7-1.9x slower than ViewComponents, render :collection or inlining. Using ActionView::Base.logger = nil is a big speedup.
  • (Not-replicated) I didn’t see a speedup by using a full partial path.
  • (Subjective) Without logging, nested partials are 1.3x slower than inlining, but ViewComponents and render :collection are still 1.1x slower than inlining. Given that I used simplistic examples, and I assume that it’s a fixed-cost difference regardless of the complexity within the partial… it’s maybe a small difference in the big context of things, without logging.

Here’s my reproduction repo: https://github.com/bensheldon/benchmark_render

Benchmark Output (with logging)
❯   RAILS_VERSION=6.0.5 ./benchmark_render.rb \
  && RAILS_VERSION=7.0.3 ./benchmark_render.rb \
  && ./benchmark_render.rb

Rails version: 6.0.5
Warming up --------------------------------------
             /inline   313.000  i/100ms
        /nested_loop   172.000  i/100ms
   /nested_loop_path   174.000  i/100ms
  /nested_collection   272.000  i/100ms
            /vc_loop   298.000  i/100ms
Calculating -------------------------------------
             /inline      3.254k (± 1.3%) i/s -     16.276k in   5.003258s
        /nested_loop      1.737k (± 1.4%) i/s -      8.772k in   5.051732s
   /nested_loop_path      1.727k (± 1.2%) i/s -      8.700k in   5.039853s
  /nested_collection      2.754k (± 3.6%) i/s -     13.872k in   5.045295s
            /vc_loop      3.029k (± 3.4%) i/s -     15.198k in   5.023531s

Comparison:
             /inline:     3253.7 i/s
            /vc_loop:     3029.4 i/s - 1.07x  (± 0.00) slower
  /nested_collection:     2753.9 i/s - 1.18x  (± 0.00) slower
        /nested_loop:     1736.8 i/s - 1.87x  (± 0.00) slower
   /nested_loop_path:     1726.5 i/s - 1.88x  (± 0.00) slower

Rails version: 7.0.3
Warming up --------------------------------------
             /inline   308.000  i/100ms
        /nested_loop   173.000  i/100ms
   /nested_loop_path   174.000  i/100ms
  /nested_collection   268.000  i/100ms
            /vc_loop   281.000  i/100ms
Calculating -------------------------------------
             /inline      3.153k (± 2.5%) i/s -     16.016k in   5.083613s
        /nested_loop      1.732k (± 1.4%) i/s -      8.823k in   5.094727s
   /nested_loop_path      1.748k (± 2.5%) i/s -      8.874k in   5.081193s
  /nested_collection      2.686k (± 2.7%) i/s -     13.668k in   5.091552s
            /vc_loop      2.858k (± 1.7%) i/s -     14.331k in   5.016606s

Comparison:
             /inline:     3152.5 i/s
            /vc_loop:     2857.5 i/s - 1.10x  (± 0.00) slower
  /nested_collection:     2686.5 i/s - 1.17x  (± 0.00) slower
   /nested_loop_path:     1747.5 i/s - 1.80x  (± 0.00) slower
        /nested_loop:     1732.2 i/s - 1.82x  (± 0.00) slower

Rails version: 7.1.0.alpha
Warming up --------------------------------------
             /inline   307.000  i/100ms
        /nested_loop   181.000  i/100ms
   /nested_loop_path   180.000  i/100ms
  /nested_collection   258.000  i/100ms
            /vc_loop   280.000  i/100ms
Calculating -------------------------------------
             /inline      3.062k (± 3.4%) i/s -     15.350k in   5.020859s
        /nested_loop      1.784k (± 4.1%) i/s -      9.050k in   5.082636s
   /nested_loop_path      1.736k (± 2.0%) i/s -      8.820k in   5.083707s
  /nested_collection      2.647k (± 1.9%) i/s -     13.416k in   5.069750s
            /vc_loop      2.829k (± 2.5%) i/s -     14.280k in   5.051345s

Comparison:
             /inline:     3061.6 i/s
            /vc_loop:     2828.8 i/s - 1.08x  (± 0.00) slower
  /nested_collection:     2647.2 i/s - 1.16x  (± 0.00) slower
        /nested_loop:     1784.0 i/s - 1.72x  (± 0.00) slower
   /nested_loop_path:     1735.7 i/s - 1.76x  (± 0.00) slower
Benchmark Output (without logging)
❯ WITHOUT_LOGGER=1 RAILS_VERSION=6.0.5 ./benchmark_render.rb \
  && WITHOUT_LOGGER=1 RAILS_VERSION=7.0.3 ./benchmark_render.rb \
  && WITHOUT_LOGGER=1 ./benchmark_render.rb
Rails version: 6.0.5
Warming up --------------------------------------
             /inline   353.000  i/100ms
        /nested_loop   182.000  i/100ms
   /nested_loop_path   275.000  i/100ms
  /nested_collection   325.000  i/100ms
            /vc_loop   326.000  i/100ms
Calculating -------------------------------------
             /inline      3.518k (± 6.0%) i/s -     17.650k in   5.038094s
        /nested_loop      2.770k (± 1.8%) i/s -     14.014k in   5.060657s
   /nested_loop_path      2.715k (± 4.3%) i/s -     13.750k in   5.074845s
  /nested_collection      3.218k (± 4.3%) i/s -     16.250k in   5.061147s
            /vc_loop      3.310k (± 2.0%) i/s -     16.626k in   5.024399s

Comparison:
             /inline:     3517.7 i/s
            /vc_loop:     3310.4 i/s - same-ish: difference falls within error
  /nested_collection:     3217.8 i/s - same-ish: difference falls within error
        /nested_loop:     2770.1 i/s - 1.27x  (± 0.00) slower
   /nested_loop_path:     2715.4 i/s - 1.30x  (± 0.00) slower

Rails version: 7.0.3
Warming up --------------------------------------
             /inline   345.000  i/100ms
        /nested_loop   279.000  i/100ms
   /nested_loop_path   272.000  i/100ms
  /nested_collection   304.000  i/100ms
            /vc_loop   310.000  i/100ms
Calculating -------------------------------------
             /inline      3.452k (± 1.5%) i/s -     17.595k in   5.098560s
        /nested_loop      2.708k (± 3.5%) i/s -     13.671k in   5.054214s
   /nested_loop_path      2.652k (± 5.2%) i/s -     13.328k in   5.042542s
  /nested_collection      2.992k (± 3.9%) i/s -     15.200k in   5.089442s
            /vc_loop      3.082k (± 7.6%) i/s -     15.500k in   5.076108s

Comparison:
             /inline:     3451.8 i/s
            /vc_loop:     3082.2 i/s - 1.12x  (± 0.00) slower
  /nested_collection:     2991.5 i/s - 1.15x  (± 0.00) slower
        /nested_loop:     2708.5 i/s - 1.27x  (± 0.00) slower
   /nested_loop_path:     2651.9 i/s - 1.30x  (± 0.00) slower

Rails version: 7.1.0.alpha
Warming up --------------------------------------
             /inline   296.000  i/100ms
        /nested_loop   247.000  i/100ms
   /nested_loop_path   248.000  i/100ms
  /nested_collection   296.000  i/100ms
            /vc_loop   304.000  i/100ms
Calculating -------------------------------------
             /inline      3.281k (± 3.9%) i/s -     16.576k in   5.060233s
        /nested_loop      2.476k (± 5.1%) i/s -     12.350k in   5.005304s
   /nested_loop_path      2.500k (± 1.9%) i/s -     12.648k in   5.061520s
  /nested_collection      2.970k (± 2.1%) i/s -     15.096k in   5.085463s
            /vc_loop      3.051k (± 2.1%) i/s -     15.504k in   5.083644s

Comparison:
             /inline:     3281.4 i/s
            /vc_loop:     3051.1 i/s - 1.08x  (± 0.00) slower
  /nested_collection:     2969.8 i/s - 1.10x  (± 0.00) slower
   /nested_loop_path:     2499.8 i/s - 1.31x  (± 0.00) slower
        /nested_loop:     2475.6 i/s - 1.33x  (± 0.00) slower

@alipman88 That makes sense. Thanks for walking through that! It sounds like there’s probably nothing that can be done to improve this specific problem.

Our “fix” to this problem in our app has been to move away from partials towards helper methods that build markup through the use of the tag helper. We make extensive use of these little bits of markup to encapsulate our visual theme into reusable chunks. So where before we may have had something like this:

_fuzzy_option.html.slim

- html_class ||= ""

a.fuzzy-select__item(
  class=html_class
  data-action="touchdown->fuzzy-select#selectItem mousedown->fuzzy-select#selectItem"
  data-fuzzy-select-target="item"
  data-fuzzy-search-target="item"
  data-value=value
  role="button")

  - if block
    = capture(&block)

theme_helper.rb

def fuzzy_option(**opts, &block)
  render "components/fuzzy_select/fuzzy_option", **opts.merge(block: block)
end

We now put all of that markup directly in the helper method

theme_helper.rb

def fuzzy_option(**opts, &block)
  tag.a(class: "fuzzy-select__item #{opts[:html_class]}",
    data: {
      action: "touchdown->fuzzy-select#selectItem mousedown->fuzzy-select#selectItem",
      fuzzy_select_target: "item",
      fuzzy_search_target: "item",
      value: opts[:value],
    },
    role: "button",
    &block
  )
end

We would prefer to keep our markup out of our ruby files, but this seems to be the only way to sidestep the performance penalty of calling render.

I don’t suppose anyone knows if there is a way to bypass a majority of the overhead cost of render? Possibly by passing a full file path to it instead of using the "components/fuzzy_select/fuzzy_option" shorthand that requires it to traverse the view path to find the file?

And thank you to everyone who has spent time looking at this and thinking about it 🙏

I see that the app is using slim. Does this also happen if the app was only using ERBs?