packwerk: [Bug Report] Max RSS is Significantly Higher in Version 2.0

Description Memory usage in packwerk v2.0 is significantly higher than it was in v1.4—so much higher that we will need to increase the resource class of our CI nodes in order for us to run packwerk v2.0. It looks like the max RSS for the whole process tree has increased from ~850 MB to about ~2.5GB for our use-case.

I ran gnu time -v on bin/packwerk check in our application working directory using both versions of packwerk and saw the following results:

Version 1.4 Version 2.0

Command being timed: “bin/packwerk check” User time (seconds): 79.86 System time (seconds): 12.22 Percent of CPU this job got: 719% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:12.79 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 83560 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 399579 Voluntary context switches: 10258 Involuntary context switches: 179007 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 36 Page size (bytes): 4096 Exit status: 0

Command being timed: “bin/packwerk check” User time (seconds): 81.06 System time (seconds): 11.77 Percent of CPU this job got: 381% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:24.34 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 334968 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 5862 Minor (reclaiming a frame) page faults: 1233804 Voluntary context switches: 35237 Involuntary context switches: 156020 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 16 Socket messages received: 16 Signals delivered: 40 Page size (bytes): 4096 Exit status: 0

To Reproduce Run packwerk while watching memory usage, compare between v1.4 and v2.0

Expected Behaviour I expected memory usage to not increase so dramatically between v1.4 and v2.0.

Screenshots

The GNU time output only captures the max RSS of the root process, so I grabbed a couple screenshots from activity monitor to demonstrate that the memory usage is much higher in each of the child processes as well.

Packwerk v1.4 Memory Usage packwerk-v1 4-memory-usage

Packwerk v2.0 Memory Usage packwerk-v2 0-memory-usage

Version Information

  • Packwerk: v2.0
  • Ruby v3.0.2

Additional Context

I haven’t looked into the changes yet, but I assume this is because packwerk now loads our rails application in each of its processes? I still wouldn’t expect each child process’s max RSS to balloon up so much.

Do you all have any idea why this might be? And secondly, any idea if it would be possible and reasonable to bring this number back down?

Thanks!

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (15 by maintainers)

Most upvoted comments

Rails is very good with copy-and-write, that is how puma and unicorn work in production without each process having the entire Rails application in resident memory, so if we boot the application in the master process, before we fork the other processes, the resident memory of each one of those process would be only for the new allocated or modified memory.

I haven’t had a chance to look at the proposals yet, but I can say that we were able to work around the issue by reducing parallelism for now, though I would still like to see this improved.

I think booting rails in a side process and sending the load paths back to the main process before the fork sounds like a good implementation to me, but I need to look into it more closely. From a high level, though, I would support pursuing a more optimal solution over thrashing through a short-term fix.

Hey all – Thanks for this report and for the great discussion around it. I’ve been looking into this issue and seeing what it would look like to implement @ngan’s suggestion of extracting what we need from Rails in a separate isolated process. The results are here: https://github.com/Shopify/packwerk/pull/166

I’m pairing with @ngan tomorrow to chat through this and get his feedback, but I’d love your feedback as well.

I expect that if you pull this version and use it your CI environment it should resolve the memory issue. I also expect that bin/packwerk check and bin/packwerk update-deprecations should have the same behavior as the version on main.

Here are the results before and after this change on our codebase.

While it does resolve the memory issue, it does slow down bin/packwerk check on one file quite a bit, presumably we are no longer using spring to load the bundle in the main process.

Memory Usage Results

Before

ps aux | grep packwerk

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      5683 23.5  3.1 943504 509392 ?       Ssl  08:56   0:02 packwerk
root      5687 67.7  3.1 812508 508412 ?       R    08:56   0:04 packwerk
root      5688 60.4  3.1 812468 508164 ?       R    08:56   0:04 packwerk
root      5689 64.1  3.1 812508 508204 ?       R    08:56   0:04 packwerk
root      5690 61.5  3.1 812468 507968 ?       R    08:56   0:04 packwerk

After

ps aux | grep packwerk

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      5511  8.6  0.7 381564 113580 pts/3   Sl+  08:53   0:07 ruby bin/packwerk check
root      5535 76.6  0.5 350820 95408 pts/3    R+   08:54   0:46 ruby bin/packwerk check
root      5536 75.8  0.5 343832 86344 pts/3    R+   08:54   0:46 ruby bin/packwerk check
root      5537 77.3  0.5 351060 96540 pts/3    R+   08:54   0:47 ruby bin/packwerk check
root      5538 77.0  0.5 345856 89456 pts/3    R+   08:54   0:47 ruby bin/packwerk check

Speed Results

Before

DevSpace(development) ./ $ spring stop
Spring stopped.
DevSpace(development) ./ $ time bin/packwerk check packs/gusto_slack/app/services/slack/client.rb
[TEST PROF INFO] Spring detected
Running via Spring preloader in process 6551
📦 Packwerk is inspecting 1 file
.
📦 Finished in 1.26 seconds

No offenses detected
No stale violations detected

real	0m14.566s
user	0m0.153s
sys	0m0.016s
DevSpace(development) ./ $ time bin/packwerk check packs/gusto_slack/app/services/slack/client.rb
Running via Spring preloader in process 6566
📦 Packwerk is inspecting 1 file
.
📦 Finished in 1.25 seconds

No offenses detected
No stale violations detected

real	0m2.603s
user	0m0.146s
sys	0m0.012s

After

DevSpace(development) ./ $ spring stop
Spring stopped.
DevSpace(ae-use-local-packwerk) ./ $ time bin/packwerk check packs/gusto_slack/app/services/slack/client.rb
fatal: not a git repository (or any of the parent directories): .git
📦 Packwerk is inspecting 1 file
.
📦 Finished in 1.71 seconds

No offenses detected
No stale violations detected

real	0m29.934s
user	0m8.133s
sys	0m2.030s
DevSpace(ae-use-local-packwerk) ./ $ time bin/packwerk check packs/gusto_slack/app/services/slack/client.rb
fatal: not a git repository (or any of the parent directories): .git
📦 Packwerk is inspecting 1 file
.
📦 Finished in 1.58 seconds

No offenses detected
No stale violations detected

real	0m13.368s
user	0m8.576s
sys	0m1.975s

Thanks for making that distinction more clear @exterm

I think that makes sense. To reiterate what we chatted about over video call, I think that moving to a faster parser should give us a lot of wiggle room to move to a thread-based approach to address this memory issue more simply.

Thanks for the comments @exterm and @jordanstephens.

For now, based on the feedback I’ve received which all makes sense, I think I’ll close the proposal PR in favor of a better longer term solution.

Once everyone is back from the holidays, I’d love to (A) align on what tradeoffs feel like the right ones (complexity/maintainability, memory, performance, etc and (B) align on an implementation for delivering on those tradeoffs.

@ngan , why do you think we need to eager load the app?

Hm, that’s a good question… when I ran it with threads without eager loading, I got some errors. One of them, iirc, was “SyntaxError: dynamic constant assignment error” in one of the files. There were many other errors as well. But looking through packwerk, I can’t see anything that actually references the app’s constants causing it to autoload.

@mclark autoload paths are derived during framework boot. A lot of other things can mutate the paths on boot as well, including gems, and the app itself. It may not seem like it, but it’s actually very predictable for a Rails app to have its paths ready to go. It’s actually a public API. I really like booting up the app to find these things out as it allow apps to do more custom things to autoload paths. And as long as Packwerk can discern from the public api, we should be good.

I want to solve this problem as well because, right now, it’s forking 10x with 10x the memory usage: image

Hmm this is an interesting problem. Great analysis @jordanstephens and @ngan!

I like the automated determination of load paths for Packwerk, but I don’t like that we need to load the entire Rails app just to get them. It’s like having a dependency on an abstract library, very unpredictable. I don’t know much about Rails internals but I wonder if there is anything we can do to pull the load path logic into a (Ruby) file with minimal dependencies that Packwerk could run. This is obviously a longer term solution but feels like it could be a good architectural change for Rail as well. The same might work for inflections as well.

In the near term, I wonder if it is worth introducing an interface inside Packwerk to provide the load paths and inflections, and have the default implementation just read from the Rails app as we do now. An alternative implementation might read from the config file as we did before. A future implementation might take advantage of future interfaces in Rails that allow us to get this information without load the entire app.

Sorry for the “Why not both?” proposal 😆

I’m guessing this make sense and it sort of expected.

https://github.com/Shopify/packwerk/blob/4da4215cb9d15aeb8020a6324edc014c6094c50a/lib/packwerk/parse_run.rb#L74

Parallel will default to forking by the cpu count. And since we’re in a context of the Rails application, it’ll multiply the memory footprint dramatically (depending on the number of CPUs you have).

Off the top of my head, I can see 2 ways of resolving this issue: (1) switch to threads, or (2) load the rails environment only to extract the information we need. I’ll outline the choices below:

Switch to threads

Switching to threads is a little tricky because we’ll have to deal with thread issues. It’s not too bad since we’re not actually serving requests or anything complicated (other than loading the app), the only threading issue we’ll need to deal with is auto loading. We’re at the mercy of the application on what it does in the test environment (whether or not it eager loads). A solution would be to eager load the app (Rails.application.eager_load!) right before we do the Parallel.flat_map call. I tried this and it works. However, the performance is not as good as forking. For some data, our monolith (37946 files scanned) does the check in 37.57 seconds using forks, and 222.46 seconds using threads (6x slower!). It solves the memory issue and we might be able to investigate the performance difference separately.

Load rails and extract what we need

We could boot the rails app and dump (or however we want to communicate between the rails app and the check process) what we need. Eg load paths, inflections, etc. This allows us to stay with forks, still utilize spring to boot the application, keep the packwerk scanner process memory low.