rails: Slow performance when rendering collections of nested partials
Steps to reproduce
- To start, you need two partials (
_a.html.erb
and_b.html.erb
) - Partial
_a.html.erb
should contain a render call for partial_b.html.erb
- Render partial
_a.html.erb
in a collection
Expected behavior
Ideally, the time to render should be on par with the aggregate time to render both _a
and _b
if _b
was not nested inside of _a
.
Actual behavior
I see about a 10x difference in render time between a nested example and a flat example. I’ve done some preliminary digging with stackprof. Keep in mind that I am not well versed in the internals of ActionView rendering so my understanding of what is not working may not line up with reality. It seems like whatever optimization is done that affords such a significant speed difference between these two cases:
Fast
<%= render partial: "a", collection: 100.times.to_a %>
Slow
<% 100.times do %>
<%= render partial: "a" %>
<% end %>
…is not able to be applied to the partial that is nested inside of the outer partial. Based on my stackprof digging it looks like some of that optimization is happening in the conversion of the partial string ("a"
) into an actual path to a view file. In the fast case above, that conversion of "a"
into "#{Rails.root}/app/views/application/_a.html.erb"
happens one time, and in the slow case it happens 100 times. The same is true of the nested partials situation: the outer partial gets its view path calculated/looked-up one time, but the inner render call for the nested partial _b
gets looked-up on each iteration.
I had some back and forth on Twitter with @tenderlove about this, and his intuition from the best description I could muster in 270 characters was that this was a bug and shouldn’t require a performance hit. But it is entirely possible I wasn’t explaining it well enough to him over the limited bandwidth communication channel of Twitter 😄
Tweet
https://twitter.com/tenderlove/status/1351223079805083648
Reproduction Repo
https://github.com/willcosgrove/partial-perf-repro
Server log images
Flat case (fast):
Nested case (slow):
I should also clarify about the logs: The default view logging in development definitely adds some additional overhead to the slow case, because it logs a line of output for each iteration. But even with view logging turned off completely, there is still a significant performance difference between the two cases, so I don’t believe the difference falls entirely at the feet of ActionView logging instrumentation.
Flamegraph:
Fast:
Slow:
System configuration
Rails version:
Seems to be any version, but I tested specifically on 6.1.2.1 and main
.
Ruby version: Also seems to be any version but I tested on both Ruby 2.7 and Ruby 3.0.
CC: @tenderlove
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 5
- Comments: 16 (14 by maintainers)
I added an example using View Component, just to see how it compares in general.
Here’s my PR with the additions: https://github.com/willcosgrove/partial-perf-repro/pull/2
Something to note: My PR increases the nesting, to where there are three layers of partials/components being used.
A
page
renders 10ktweet
s, eachtweet
rendering abutton
, eachbutton
renderingbutton text
. (button text
is a stretch of an example - as it’s literally only passing content into a partial/component and returning it, but I wanted to have another layer to emphasize speed differences).Flat:
60ms | ~57k allocations
Nested Partials:
3015ms | ~1.74M allocations
Methods, using Tag Helpers:
1201ms | ~2.2M allocations
View Components:
558ms | ~487k allocations
Honestly, I’m quite surprised at the performance of
ViewComponent
here, as they are effectively just using erb partials under the hood.Again, this example is extreme to emphasize what’s going on (10k items, 3 layers of nesting), but I think it really highlights some inherent performance constraints of using partials (or even tag helpers).
@rafaelfranca Thanks for chiming in!
I added a PR swapping slim for erb - the same issue persists:
https://github.com/willcosgrove/partial-perf-repro/pull/1
Based on @jhawthorn’s comments, I wanted to pair down the benchmark. Observations:
render ViewComponent
is about the same asrender :collection
render :collection
or inlining. UsingActionView::Base.logger = nil
is a big speedup.render :collection
are still 1.1x slower than inlining. Given that I used simplistic examples, and I assume that it’s a fixed-cost difference regardless of the complexity within the partial… it’s maybe a small difference in the big context of things, without logging.Here’s my reproduction repo: https://github.com/bensheldon/benchmark_render
Benchmark Output (with logging)
Benchmark Output (without logging)
@alipman88 That makes sense. Thanks for walking through that! It sounds like there’s probably nothing that can be done to improve this specific problem.
Our “fix” to this problem in our app has been to move away from partials towards helper methods that build markup through the use of the
tag
helper. We make extensive use of these little bits of markup to encapsulate our visual theme into reusable chunks. So where before we may have had something like this:_fuzzy_option.html.slim
theme_helper.rb
We now put all of that markup directly in the helper method
theme_helper.rb
We would prefer to keep our markup out of our ruby files, but this seems to be the only way to sidestep the performance penalty of calling
render
.I don’t suppose anyone knows if there is a way to bypass a majority of the overhead cost of
render
? Possibly by passing a full file path to it instead of using the"components/fuzzy_select/fuzzy_option"
shorthand that requires it to traverse the view path to find the file?And thank you to everyone who has spent time looking at this and thinking about it 🙏
I see that the app is using slim. Does this also happen if the app was only using ERBs?