Fossilize: [Dota2] fossilize eats all RAM and makes the system unresponsive as a result.

Specs:

OS: Gentoo Linux
CPU: i7-9700K
GPU: AMD 5700xt (mesa drivers)
RAM: 16GB

In the new steam beta after every update to Dota2 (doesn’t seem to matter how small of an update), fossilize will rebuild the vulkan shaders when launching Dota2.

When it is rebuilding it eats all of my ram (16gb) and makes my computer swap. Making it unresponsive for around 1-2 minutes while it is building the shader cache.

Perhaps there could be some mechanism in place to make sure fossilize doesn’t go overboard with memory usage if it will go over the available amount of memory?

I don’t know if this is the correct place to report this. Sorry if it is not.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 17
  • Comments: 171 (46 by maintainers)

Commits related to this issue

Most upvoted comments

Someone reported on Discord that the issue is resolved now, so closing.

Although my problem is with a different game, I can reproduce the issue at will. Last week friends and me were playing Deep Rock Galactic, and without an update of either the kernel or the NVidia drivers I had the issue yesterday. Whenever I pre-generate the Vulkan shaders the memory is filled very fast. When I opt in to generate in the background (Steam -> Settings -> Shader Pre-Caching -> Allow background processing of Vulkan shaders) Steam runs through the games and generates the shaders while displaying a percentage done and the game name in the Settings dialog under the “Allow background” option. It processes a lot of games before coming to Deep Rock Galactic. I had an htop running to diagnose, and as soon as it hits Deep Rock Galactic, physical memory (8GB free) fills up instantly, and after that, over about 15-20 seconds the swap fills up to maximum and the computer becomes unresponsive. I’ve tried waiting for 20 minutes, that situation did not change.

I find the situation that happens for a specific game interesting, maybe that helps in diagnosing the issue. I’m ready to help however I can.

Edit: After updating to kernel 5.9 and NVidia drivers 455.38 the situation persists.

@alexeysvrv With the screenshot of @jakogut the problem is easily visible: The system has 32 cores and the current implementation duplicates the database metadata on every core. An update will follow soon which stops the memory duplication by using shared memory between all cores. The thread count itself should not matter too much then because the update will also use optimized scheduling. Amount of write-back can still be a demanding issue with that many cores.

The architecture has thus already been changed and optimized. Remaining issues with write-back due to shader caching happening inside drivers is a problem that cannot be properly fixed in fossilize. This needs work from the driver makers. But let’s first see how it works when the updated code-base landed in Steam.

Thanks for the reports. The case where any fossilize_replay processes are running while a game is running is not intended. Is anyone able to reliably reproduce such a scenario?

@fmartins-andre It only takes a long time the first time, the next times are fast and only on some updates will be needed.

@simvux That’s not a bug, it’s normal and expected to use all threads it can finish faster

@MajorGonzo read the 2 above ^

If it only used up my physicaly RAM, I wouldn’t mind, but it begins filling up my swap (I’ve seen it fill all 10Gb). So it’s maxed CPU/RAM/Swap, and probably also maxed the SSD I/O which is what causes major slowness in Linux.

I don’t mind how long it takes, I just want some resources available to web browse when the game is updating/fossilizing.

OS: Manjaro

2020-08-12___20-22-22-edit 2020-08-12___20-23-39-edit

A quick trace shows that NVIDIA does not issue sync syscalls, also fdinfo has no sync flag. But the driver does a lot of teeny-tiny reads and writes with random seeks all over the file. It probably should disable readahead on the file descriptors for toc and bin. Also, it constantly calls fstat() - I’m not sure if that is a performance problem, it probably does it to find the end of the file to append data. The man page does not say anything about performance issues. So the overhead may just come from random seeks, small IO and useless readahead.

The tiny random reads and writes are problematic in CoW files (as btrfs uses), so Steam should create the shader cache folder with chattr +C. I’ll convert my fossilize cache folder once it finished doing its job and see if it helps. If anyone wants to try: You cannot chattr +C existing non-empty files.

Here’s how (useful only on btrfs):

# STOP STEAM FIRST
cd ~/.local/share/Steam/SteamApps
mv shadercache shadercache.bak
mkdir shadercache
chattr +C shadercache
rsync -av shadercache.bak/. shadercache/.
rm -Rf shadercache.bak

These observations suggest that the drivers are indeed involved here. I think we are seeing problems at different layers:

  1. Fossilize doesn’t act exactly memory, cache, CPU scheduler, and IO friendly - a fix is already merged.
  2. Steam triggers fossilize too often. To me it looks like it triggers fully walking all shader caches at least every second boot, with no real changes to the system: ValveSoftware/steam-for-linux#7306
  3. Drivers recompile and rewrite their complete cache on every fossilize run, as can be observed by that huge amount of data written. This dominates the cache, forces apps into swap and adds high desktop latency. I’m seeing loadavg spiking up to over 130 sometimes. I was under the impression the shader pre-caching should only compile what’s missing. And I don’t think that is the job of fossilize but the driver decides. This is not a CPU usage problem per-se, redoing everything on every run is the problem.
  4. Drivers (at least NVIDIA) may allocate multiple gigabytes of memory during the process.
  5. Drivers create very inefficient write patterns (at least NVIDIA). See also number 3.

Each of these problem on its own is probably not very noticeable, the combination makes the problem, and each party has to fix their part.

Some people report problems since NVIDIA 455. I don’t think there is a bug in the driver which is causing number 1 or 2. The problems have been there, they are only made visible now. Other people report there’s no difference before and after 455 which suggests number 3 has also been there before. AMD users report similar problems, so number 4 and 5 may apply to other vendors, too.

@jakogut Oh I see you’re using zswap… You may want to disable that. At least for me it tends to fill the swap with trash and probably end up with useless compressed data in RAM, in the end thrashing the page cache. Or at least use a much smaller memory pool: The default of 20% is just too high for modern systems. I’d go with something like 5%. Also, oomkiller should not be needed, modern kernels are quite good and should not touch interactive processes. You can verify that by adding oomscore column to htop (oh I saw you did that). Using uksmd may be more beneficial if you’re using a lot of processes with potentially duplicated memory.

@jakogut Latest commits have a memory optimization which should mostly eliminate it. If you know which game has shaders that cause 64 GB of memory usage, you might want to compile the current version, copy the shader database from Steam to a new location, and run fossilize-replay on this test set with --progress on cmdline (to force it into the same mode that Steam uses). I still need to test this but the changes are already merged so I think they’ll work.

It’ll still cause a lot of write-back pressure in the page cache which may trigger the kernel to swap data out. But that’s a whole different kind of beast that cannot be properly solved currently without dirty hacks. If your IO subsystem is fast enough, it should be no problem. If the updates landed in the Steam distribution, you’ll probably see that fossilize-replay child processes will show up as “exe” instead of “fossilize-replay” in top/htop (may depend on settings) - then you’ll know that the updates landed.

The new updates will also change scheduling behavior and readahead behavior of the process which should further improve things.

Slightly tangentially, in my case opening Steam itself doesn’t cause the Resource-devouring-Fossilize processes. In my case, it is only when I launch Borderlands 3, and it only happens roughly 60% of the time, and about half of those times, it takes over my system such that it’s nigh impossible to escape/cancel/skip/shut-down.

In the attached screenshot, you can see that Fossilize processes haven’t entirely taken over my system, this time, but close. I can also confirm that hitting the skip button forces the entire procedure to run in the background, thus rendering the game unplayable for … well sometimes it seems permanent until an entire system reboot.

Screenshot from 2020-08-31 21-38-46

Disable vulcan in dota2. ` cd <steamdir>/steamapps/shadercache

find . -type f -print0 | xargs -0 rm `

https://www.reddit.com/r/DotA2/comments/c199kv/dota_2_crashes_on_launch_on_linux_with_shader/erc9r8t?utm_source=share&utm_medium=web2x&context=3

helps me.

Having the same problem too with Path of Exile. I started to skip the processing of shaders before the game launched because it was taking me 2-3 hours to finish (I wanted to play), and I think after that the game started to crash at least once per game session. Now I realize it may have been from lack of RAM, since today I looked at it I had my 16GB full + half of my swap. Needless to say, my computer froze. I don’t remember that happening before I started to skip the shaders processing, but I don’t have the time to test it today. My computer specs:

Arch Linux
Kernel version: 5.7.12-arch1-1
CPU: i5-7500
RAM: 16GB
GPU: Nvidia GTX 1050 Ti

It seems most people in this thread are using mesa, but the problem seems to be the same with Nvidia OpenGL. The version I’m currently running:

OpenGL core profile version string: 4.6.0 NVIDIA 450.57

Fossilize processes don’t terminate even after I close Steam and Path of Exile unless I do it manually. If PoE was crashing due lack of memory, well… that’s rough.

@Samega7Cattac it’s not a bug, just a really bad overlooked flaw. It should most definitely not render the computer unusable just to get a 5% faster compilation time from using 12 instead of 11 threads. I tend to want to use my computer during this processing step.

I’m having the same issue but for seemingly completely different reasons. Ram-wise I’m fine, but the processing step maxes out all my 12 threads, not leaving one for other tasks.

This feels like a terrible idea for something that can kick in in the background automatically after game updates…

OS: Void Linux
Kernel: 5.7.13_1
CPU: Ryzen 5 2600X
RAM: 16GB

Maybe this should be a separate issue?