dxvk: Games crash on Nvidia due to memory allocation failures
For some reason it looks like DXVK’s device memory allocation strategy does not work reliably on Nvidia GPUs. This leads to game crashes with the characteristic DxvkMemoryAllocator: Memory allocation failed
error in the log files.
This issue has been reported in the following games: #1099 (Bloodstained: Ritual of the Moon) #1087 (World of Warcraft)
If you run into this problem, please do not open a new issue. Instead, post a comment here, including the full DXVK logs, your hardware and driver information, and information about the game you’re having problems with.
Update: Please check https://github.com/doitsujin/dxvk/issues/1100#issuecomment-509484527 for further information on how to get useful debugging info. Update 2: Please also see https://github.com/doitsujin/dxvk/issues/1100#issuecomment-515083534. Update 3: Please update to driver version 440.59.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 244 (96 by maintainers)
For some additional logging information could people try to add the following kernel module option to nvidia.ko:
NVreg_ResmanDebugLevel=0
You can add this option with modprobe via the command-line at module-load time, or by creating a modprobe configuration file. Here’s a sample command-line for loading the nvidia.ko module with this option:
modprobe nvidia NVreg_ResmanDebugLevel=0
You can verify that this option is set by running the following command:
grep ResmanDebugLevel /proc/driver/nvidia/params
Note: The kernel module must be unloaded before running modprobe via the command-line in order for this option to be set. If you run modprobe when the module is already loaded it will return an exit code of 0 and not present any warning messages indicating that no change has taken place.
This will help us track information at the system memory page allocation level, and will be extremely verbose. If you enable this option you’ll want to be mindful of your physical storage device usage, and disable this option after you’ve gotten a reproduction.
This will log to dmesg, so in addition to the normal d3d11 and dxgi logs, please send us an nvidia-bug-report.log.gz file, which can be generated using the nvidia-bug-report.sh script (normally placed in /usr/bin). If you’re unable to attach the bug report log to this GitHub thread, please send an email to
linux-bugs [at] nvidia.com
and put “DXVK Memory Crash” in the Subject field.I pushed some more memory allocation tweaks, most importantly, DXVK will now try to allocate smaller chunks if a large allocation fails. This will not solve the underlying issue, but might help in some cases.
Additionally, 32-bit games (i.e. all D3D9 games) will use a smaller chunk size for host-visible memory types, which should hopefully help a bit with games running out of 32-bit address space, but that’s a different issue.
This build includes D9VK as well with the patches applied, so might be worth testing there as well: dxvk-memory.tar.gz
NVIDIA Vulkan Beta Driver 435.19.03 fixed alt tabing on GTA V It was crashing/hanging because i was running out of VRAM.
@alligatorshoes I remember hearing that D9VK increases memory usage on Windows compared to the native implementation (e.g. 1GB higher for A Hat In Time). You could try filing an issue with D9VK about memory usage. You would probably want to do comparisons with native Windows before filing it. I completely understand how the need to test on Windows before filing that issue could be a problem.
That said, it would be best to report only problems with 64-bit games here. This issue focuses on the nvidia driver reporting out of memory when there is sufficient address space and system memory. 32-bit games getting out of memory almost always involve running out of address space, which is a different issue entirely.
For the people affected, please try it. That would be of great help!
As the context provided in the changelog entry implies, this fixes a different issue with different symptoms. That was #1169
For the issue here, there has been a patch floating around that we were waiting for feedback on, and we didn’t really get a lot of testing data from end users. It has now been added to our trunk, and will show up in the next release in our Vulkan beta sidebranch, as well as in an unspecified future official release.
Getting same crash on work computer, with Radeon HD 8570 / R7 240/340 OEM
@ryao OK, no problem. Everything you’ve said makes sense. Thanks to you and @SveSop for the assistance, and my apologies for cluttering up the GitHub issue! For the record, the game doesn’t crash at all using WineD3D but naturally performance is worse 😄 so I’ll post on D9VK and see if there’s any luck. Have a great weekend.
I did some experimentation with Unigine Superposition, and a rather… uhm… insane setting.
Allocation > 10GB vram (according to DXVK Hud)
Now, doing this ended up in some nice flashbacks from coming home from vacation in the 70’ies… ie. a nasty slideshow.
Other than that, i was only able to cause DXVK memory allocation failure when creating a ramdisk with files filling ram beyond swap while superposition was loading, and tbh using a graphics setting that is 2GB > vram + filling swap to the brim is NOT what is happening when i regularly play. I also started Chrome at the same time this horrible slideshow happened, and that caused something…
PID 1115: /usr/bin/nvidia-persistenced --user nvidia-persistenced
(The PID usually points to this, but disabling persistenced just made it point to something else when caused, so i dunno if it is just some sort of “pick whatever nVidia process it finds with the lowest PID” kind of thing?)It did not matter if i used 0, 1 or 2 for
/proc/sys/vm/overcommit_memory
, other than not being able to create any large files in ramdisk when using “2” (to NOT allow any kind of overcommit). Superposition still loaded with > 10GB vram allocated (as i have 10’ish GB of free sysmem).Probably not really helpful, but somewhat “proof” that it is as @kakra sais above, probably not related to kernel
overcommit_memory
setting on the immediate part.The heuristic of
vm.overcommit_memory=0
(default) is simple: It will reject only obvious over-allocations. Since allocation requests from the graphics driver / DXVK are usually small (or kind of smallish, in any case far from obvious over-allocations), this makes no difference. The problem happens only later because memory will not be really allocated upon allocation request but only later lazily when data is written to this memory. If now, on write request, memory cannot be allocated, it’s already too late to deny the allocation. The process will be OOM killed. This does clearly not happen here. So changing this setting is not the root problem and doesn’t solve it.It also makes no sense how you can see less problems with
vm.overcommit_memory=1
because memory availability is never checked but just granted. The graphics stack will eventually fill it fast and suffer an OOM kill. But this doesn’t happen here.I think NVIDIA already found a memory allocation problem in its Vulkan driver and a fix is in the works: It can fail some allocations when it shouldn’t.
Also, I’m not sure how the driver internals work but the driver probably needs some invariable mappings between sysmem and VRAM so data can be transferred via PCIe bus direct memory access. This memory also probably has to be contiguous. And I think here’s one problem: With huge memory pages enabled, it’s exponentially harder for the kernel to find such regions without defragmenting memory. And since the NVIDIA driver doesn’t support page table mappings and page-fault handling, the kernel also is not able to move (or swap) memory around allocated by the graphics driver: There’s just no interface that could notify the driver or kernel of such events. In contrast, the AMD driver does support this, and thus it can do real overcommits.
But
vm.overcommit_memory
is different from overcommitting VRAM. That’s two different pairs of shoes (although it’s basically the same idea).That’s at least why disabling huge pages and closing browser windows helped a lot for me: Chrome is a VRAM memory hog in Xorg (1.5G+ VRAM allocated most of the time). I now have 24G of RAM installed (and also upgraded to a 6G VRAM graphics card) and it became exponentially harder to force this problem show up, even with the browser windows still open and huge pages enabled (but in a less intrusive mode than default). I’m still experimenting with this.
So I suggest to try lowering your memory footprint: stop services, stop programs, drop caches (so there’s a higher chance of having a lot of free contiguous memory) and then try again. I can only conclude that changing the VM overcommit settings for you changes how your system arranges memory, maybe it swaps a little more or less? The settings itself should have absolutely no influence on NVIDIA memory allocation behavior.
If the OOM killer kicks in, it will say so in
dmesg
.You could also try
Alt+SysRq+F
(you may need to enable it in some distributions first, it may be disabled) prior to starting the game: It will trigger the OOM killer and free memory, maybe even kill a process in that effort. Then see if the game still fails. Alternatively:echo f > /proc/sysrq-trigger
.There was no context, unless you’re privy to something else. No where was Squad even mentioned let alone the bug.
There are several reports of memory allocations failing. 1169 may be triggered by some other specific code path within the driver but the end result is the same as this one. Memory is not allocated. The crashes likely are a result of where the allocation happened.
When the errors are being reported by the hardware it’s pretty far outside anything end users would understand. What else is NVIDIA expecting? It’s literally impossible to diagnose a binary blob. Maybe it’s a specific bios vendor or some combination of hardware.
Maybe related ? https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-Generic-Allocator-2019
I have never experienced an kernel OOM killer in these cases. I have never “run out of gpu mem” either, as games like WoW rarely uses more than 2-3GB out of my 8GB, and have never actually been filled.
So there is IMO not a actual case of low memory that is causing this. There have been talks about fragmentation, and that is possibly a valid culprit, as you could have plenty of free memory, just not in a contiguous block - and that triggers a memory allocation problem.
One stupid question I have been trying to ask elsewhere is that the extension
KHR_dedicated_allocation
that is being used by DXVK seem to be constantly allocating and freeing chunks when used. Could this be causing problems when gaming for a prolonged time? Can it cause fragmentation of sorts?@SveSop Switching “sync modes” should really not effect how the driver handles memory, or even other stuff. It just changes how the wine source is waiting on events. By default, it does that quite inefficiently with higher latency. “esync” has improved on that. And “fsync” is a step further by moving the waiting group for events right into the kernel. This probably reduces context switches by a big factor. Context switches are normal (and needed) but can be quite bad to the CPU performance (because switching context between threads flushes various caches and registers inside of the CPU).
I’m also not sure why recreating the “.nv” cache has any other side-effects than reduced performance for you. I’ve never seen that problem here.
Both your observations may be a net effect of other problems you’re experiencing. One of those you might have found: Enabling “Above 4G decoding” allows the kernel to use your GPU without allocating bounce buffers (buffers that are bouncing their mapping between below 4G 32-bit address boundary and what the driver expects). With below-4G mapping, it is not possible for your GPU to present its VRAM to the CPU as a whole. It has to constantly swap mappings. Under certain conditions, the kernel may have a hard time finding address space below 4G to map to the CPU. This could well explain why you’re still seeing memory related errors. And extensively bouncing buffers could have yet undetected side-effects in the driver when multiple threads require different mappings at the same time, also constantly changing. This may explain why switching the sync algorithm affects you: Without fsync or esync, it’s much less likely that stuff happens at the same time. But this probably needs fixing at multiple layers, not only the driver.
32 bit games aren’t really affected by above-4G mappings: They only see a 32-bit address space anyways counting from 0, no matter where the 64-bit OS mapped their address space into memory. But the kernel would have a hard time constantly finding address space below 4G if the BIOS doesn’t allow above 4G. 32-bit OS will always allocate from below 4G, it shouldn’t be affected by the setting. This is probably just a setting to fix bugs with drivers or hardware that claim to properly support 64-bit addressing but in reality they don’t. Only in such cases you can and should use below 4G mappings. BIOS settings tend to be at a most compatible and conservative setting by default, even if that means reduced performance. You should change them if you know your hardware could do better (this excludes “overclocking” as a recommendation).
This BIOS setting may come in different flavors, be it “64-bit OS support”, “high memory DMA”, “4G IO limit”… If other users are still affected they may want to check their BIOS settings. @SveSop good find. 😃
I had a similar experience to yours. I renamed the cache folder to force a rebuild of the shader cache and I was able to play for a night without any crashes (though I had to suffer the rebuild like you). Eventually the problem came back though. I’m not an expert in these things, but since my issues seemed memory related perhaps having to rebuild shaders prevented the memory fallback driver bug from emerging as quickly as if everything was built and ready to load into vram.
I’ve been stalking this thread for some time as it seemed the only place close to providing an answer to the crashes. I’ve been having constant video freezes after 5-30 minutes of play with a GTX 760 2GB on kubuntu using the 430 drivers in Path of Exile and Overwatch. XID errors 69 and 31.
I saw the new beta driver 435.24.02 came out, touting fixes to memory allocation crashes (specifically using system memory as a fallback for full vram) leading to XID 31 and installed it. So far no crashes since.
@alligatorshoes That is likely out of scope for this issue. I have had similar problems with Company of Heroes. It supports both Direct3D 9 and Direct3D 10, but uses a 32-bit binary. The only thing that I can say is to try using WineD3D. It seems to have somewhat lower address space usage than DXVK and D9VK.
Other than that, higher memory usage than native Windows on 32-bit causing out of memory failures can be considered to be a wine bug from the memory overhead of emulating the NT kernel in userspace. I have heard that Codeweavers might have a solution for this in the future due to the work that they are doing to support 32-bit on future versions of Mac OS X. Perhaps things will be better in the future.
@SveSop Thanks so much for the information! I’ll go ahead and do some testing and report back.
@alligatorshoes I have not done this on Arch, but i would assume DKMS works the same. You need to find your driver source (for ubuntu this is
/usr/src/nvidia-435.19.03
), and you can just edit the/usr/src/nvidia-435.19.03/nvidia/nv-vm.c
file directly (remember to use the sudo command), or use the patch. Arch might have sources in a different folder than /usr/src/nvidia-xxx tho.Kernel-headers, gcc and various compile stuff ofc needed if you have not installed that.
Once that is done, you remove your old kernel module like this:
sudo dkms remove nvidia/435.19.03 -k $(uname -r)
(This removes your running kernel module)sudo update-initramfs -u
(Dunno if this is actually needed… I just tend to do it)sudo dkms autoinstall -k $(uname -r)
(This install the kernel module with your modified source)sudo update-initramfs -u
(This updates your initramfs with the new kernel module)Reboot, and you should be using the fix. Good luck.
@pchome
Well, after system was in use for a while, and caches was filled in a “natural” way, I did ten Unigine Superposition launches in a row. I didn’t tested whole benchmark, only the fact it launches.
The results (for 2GB VRAM, 8GB RAM, custom Superposition profile: 1080p windowed and high shaders/textures ):
p.s. patched 435.19.03
Go on…
…oh.
Anyway:
This is #1169. Exactly the same Xid & details I get when Squad croaks on me.
This also matches what I see. The PID is either
SquadGame.exe
orXorg
, but the freeze only happens when playing Squad. Looks like nvidia-driver just pins the fault on whatever process happened to win the bork lottery that time by touching a VRAM address range or something (I’m a moron regarding GPU stuff, please interpret my speculation as a form of poetry).Or just
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
.Regarding transparent huge pages, I’m running without issues since I’ve changed some settings:
One trick seems to be lowering the
max_ptes_none
(see https://www.kernel.org/doc/Documentation/vm/transhuge.txt), essentially this tells THP how many extra small pages it may allocate from memory to fill a huge page for combining. I.e., a value of 128 allows the kernel to allocate additional 512k of memory to make 1.5M of allocated memory into a full 2M huge page. So it allows wasting up to 25% of memory in my case. The default value seems to be much higher, thus allowing huge amounts of memory to be wasted when THP kicks in. The system will take it from swap instead, in case of the video driver, that’s not possible.I recommend to completely turn the THP feature off if you’re using less than 16G of memory and running games. For other applications it may still be worth to have it on, especially if most memory is going to be allocated by one or two single services.
If you still want to try, I recommend using
max_ptes_none
with 128 (with high system memory) or 64 (with low system memory, e.g. below 16G), maybe even smaller values. However, the less memory you allow the kernel to allocate in addition to the existing allocation, the lower the chance of gaining performance by THP. I think the default is somewhat near the full huge page size (maybe even as near as 511 pages). That default is much too high for applications that tend to allocate less than 2M of memory a lot.If you’re seeing bad swap performance, you may also want to adjust
max_ptes_swap
(defaults to 64 pages = 256kB): This value allows the kernel to swap back in so many pages to create a huge page.Also, I’m recommending going with the
defer+madvise
parameter: This makes THP mostly useless for non-THP-aware applications (unless free memory is not fragmented) but the THP defrag will only kick in for applications that explicitly ask for THP, and only in deferred mode to reduce allocation latency/stalls.The
within_size
parameter is interesting for shared memory allocations. There’s a similar parameter for tmpfs that you may want to apply especially on systemd systems (since those mount/tmp
with tmpfs). This parameter makes SHM only use THP if at least 2M are allocated. Pulseaudio users may benefit from that for some games.So, does this happen with other vulkan applications or with dxvk only?
One thing to note is that 32-bit games are not particularly interesting for this issue since those are more likely to just run out of address space. This especially applies to D9VK since 64-bit D3D9 games don’t really exist.
They might still run out of memory, but there’s a good chance that this has nothing to do with the driver issue being discussed here.
It is not unclear if you’d actually look at the code in
memory.cpp
: If your heap is too small to fit at least 16 chunks, it will fall back to a smaller chunk size instead. This is to provide at least 16 chunk types (because there’s different types of memory allocated from chunks - but it will use only one type of memory allocations per chunk). That’s why I said previously that bigger chunks may reduce the problem: It’s likely that all types of memory will be allocated very early during game initialization, and later on there’s a higher chance of no additional chunk being needed for a specific small allocation type. But it will also increase the chance of failing to allocate a chunk later if it is needed because the system may struggle with finding contiguous memory for it (external vs. internal fragmentation).If your heap is too small (below 2048 MB), it will instead allocate just smaller chunks (
heapSize
devided by 16). @doitsujin I wonder: Are there any alignment constraints? Because let’s say we have some uncommon heapsize 1024+512=1536M, and divide that by 16, it would allocate 96M chunks. This is not a power of 2. Does this matter?3b1376b2feba0bed66fd3581766bf8c357a33ecc increases the chunk size to 128 MB (from 64), please test if this changes anything. Here’s a build: dxvk-master.tar.gz
Can’t say I’m surprised, I used to use automatic huge page for tmpfs (with huge=within_size) and it frequently led to my PC fully freezing (randomly) when doing things like building software on tmpfs (took me a while to realize it was the problem). That made me lose faith in the thing and I disabled huge pages completely. The idea behind it isn’t bad though, but I’d rather stay away for a while (could be fixed though, I know huge pages are actively being worked on). Transparent huge pages is however a default on a lot of distributions, I’d assume it “usually” works fine, but wine and games perhaps lead to more unusual use-cases.
This doesn’t sound like it’s related to this issue though.
I had same problem with Fallout4 + Proton 4.2-7 + GTX960. PC freezed each 30-40 min, however problem got fixed after disabling TRANSPARENT_HUGEPAGES. Try put transparent_hugepage=never into linux kernel options (grub.cfg).
Since i am an incredibly slow learner, and a n00b… Let me just ask this to TRY to get my head around this “allocated” thing. The Cuda app i posted above “allocates” vram from “actual” vram. If i have 7800MB free vram, i can allocate 7800MB, but if i try to allocate 7900MB i get “Error, could not…” So, when i open eg. firefox, it uses (according to nVidia SMI) 79MB. When i play WoW at my current resolution/settings, the app uses 1880’ish MB. This does not vary much, but may vary with spell effects, and possibly when changing “worlds” (ref. expansions and different texture details and whatnot). Simple math according again to nVidia SMI, 1880 (wow) + 79 (firefox) = 1959mb. This means i can allocate 6GB (well… i could allocate 5960MB with the cuda app).
Reading from DXVK HUD, the “allocation” is 4500+ MB. What is this “allocation”, and is this “unlimited”? Is the allocation limited by vram + system ram? (in my case 8 + 16 = 24GB) From the little tests i have done, it is atleast clear that the “allocated” and “used” listed on dxvk hud does not in any way limit me allocating vram with the cuda app, or starting chrome or whatnot. The only thing that actually spew an error message is if i try to use the cuda app to allocate > available vram.
What i don’t know is supposed to happen with this “dxvk allocation” is what happens if physical vram is full. From the tests it SEEMS as it will happily use system ram (as i guess this is the intended function). The “allocation” and “used” does not change, but WoW (according to nVidia SMI) uses less physical vram if the game is started in a vram starved situation vs. not. What was rather clear tho, is that it can seem as if once any actual data (textures and whatnot) is put in the system ram, it stays there for some reason. The tests with really starved vram makes the GPU usage 99%, and fps… a LOT less even after i kill the cuda app, even if i then get 5GB free physical vram. Would it not be ideal if allocation blocks could be freed or moved to vram once vram is free? Or is that not a feature available to vulkan… or perhaps a driver thing that things dont get “transfered”?
https://github.com/doitsujin/dxvk/commit/138dde6c3d4458a1d262093b93773b6a90090c40 seems an improvement so far.
Doing the same test as above with 7GB memory allocated with “gpufill”, WoW loaded and had a lot higher fps, although some stuttering and framespikes… closing “gpufill” to release 7GB vram brought the frametimes down, and fps up. Fairly playable, but i noticed GPU load was still 90%+ vs if normally where i was standing it usually is 45-50% with 30+ more fps.
So for the little testing i did, https://github.com/doitsujin/dxvk/commit/138dde6c3d4458a1d262093b93773b6a90090c40 did help on performance when in a out of vram situation.EDIT: Clearing the .nv/GLCache folder and WoW/retail/Cache folder brought back the same “issues” as https://github.com/doitsujin/dxvk/issues/1100#issuecomment-504068676 it seems…One other thing i noticed was nVidia-smi seemed to indicate less vram usage from WoW. Is this due to “reuising chunks” so that “actual” vram is not so much?
Well, found a little snippit to allocate ram via CUDA. https://devtalk.nvidia.com/default/topic/726765/need-a-little-tool-to-adjust-the-vram-size/
Needs cuda-dev-kit from nVidia (or distro). Compile with:
nvcc gpufill.cu -o gpufill
That way you can allocate and “spend” vram without actually spending it… What happened if i spend 6GB vram, was that WoW started as normal, and did not crash even tho after running around a bit and zoning++ vram was topped out at 7.9GB+ on my 8GB card. Did not crash, not notice any huge issues, but did not test more than maybe 10-15 minutes.
However, using “gpufill” to load 7GB ram
Closing “gpufill” by pressing enter did release 7GB of vram according to nVidia-smi, but there was no change in WoW performance. This atleast indicates that allocated vram -> system ram does not “transfer” back to actual vram even if its freed later. That may well be intended tho, but from what i gather even this experiment did not immediately crash WoW, so the crashing might not REALLY be actual memory allocation problems due to memory starvation.
./gpufill 7000
to spend 7GB vram BEFORE starting WoW, something was clearly taxed to system ram instead, cos the performance was horrible. But i still did not crash from that. Screenshot:The “shared memory” thing between vram<->sysram probably does not work the same way that swap does i guess? Ie. in a memory starving situation things gets put to swap on disk, but once memory gets freed, it does not continue to be used from swap. I have no clue what is supposed to happen in a situation like that tho?
Will do some more testing with this, and with the latest https://github.com/doitsujin/dxvk/commit/138dde6c3d4458a1d262093b93773b6a90090c40