dxvk: Games crash on Nvidia due to memory allocation failures

For some reason it looks like DXVK’s device memory allocation strategy does not work reliably on Nvidia GPUs. This leads to game crashes with the characteristic DxvkMemoryAllocator: Memory allocation failed error in the log files.

This issue has been reported in the following games: #1099 (Bloodstained: Ritual of the Moon) #1087 (World of Warcraft)

If you run into this problem, please do not open a new issue. Instead, post a comment here, including the full DXVK logs, your hardware and driver information, and information about the game you’re having problems with.

Update: Please check https://github.com/doitsujin/dxvk/issues/1100#issuecomment-509484527 for further information on how to get useful debugging info. Update 2: Please also see https://github.com/doitsujin/dxvk/issues/1100#issuecomment-515083534. Update 3: Please update to driver version 440.59.

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 244 (96 by maintainers)

Commits related to this issue

nvidia-all: Add DXVK issue #1100 test patch, suggested by NVIDIA Please help fixing the problem by giving some feedback (good or bad) at https://github.com/doitsujin/dxvk/issues/1100 You may need to ... — committed to Tk-Glitch/PKGBUILDS by Tk-Glitch 5 years ago

Most upvoted comments

For some additional logging information could people try to add the following kernel module option to nvidia.ko:

NVreg_ResmanDebugLevel=0

You can add this option with modprobe via the command-line at module-load time, or by creating a modprobe configuration file. Here’s a sample command-line for loading the nvidia.ko module with this option:

modprobe nvidia NVreg_ResmanDebugLevel=0

You can verify that this option is set by running the following command:

grep ResmanDebugLevel /proc/driver/nvidia/params

Note: The kernel module must be unloaded before running modprobe via the command-line in order for this option to be set. If you run modprobe when the module is already loaded it will return an exit code of 0 and not present any warning messages indicating that no change has taken place.

This will help us track information at the system memory page allocation level, and will be extremely verbose. If you enable this option you’ll want to be mindful of your physical storage device usage, and disable this option after you’ve gotten a reproduction.

This will log to dmesg, so in addition to the normal d3d11 and dxgi logs, please send us an nvidia-bug-report.log.gz file, which can be generated using the nvidia-bug-report.sh script (normally placed in /usr/bin). If you’re unable to attach the bug report log to this GitHub thread, please send an email to linux-bugs [at] nvidia.com and put “DXVK Memory Crash” in the Subject field.

+15

liam-middlebrook on Jul 9, 2019

I pushed some more memory allocation tweaks, most importantly, DXVK will now try to allocate smaller chunks if a large allocation fails. This will not solve the underlying issue, but might help in some cases.

Additionally, 32-bit games (i.e. all D3D9 games) will use a smaller chunk size for host-visible memory types, which should hopefully help a bit with games running out of 32-bit address space, but that’s a different issue.

This build includes D9VK as well with the patches applied, so might be worth testing there as well: dxvk-memory.tar.gz

doitsujin on Jul 16, 2019

NVIDIA Vulkan Beta Driver 435.19.03 fixed alt tabing on GTA V It was crashing/hanging because i was running out of VRAM.

kassindornelles on Sep 7, 2019

@alligatorshoes I remember hearing that D9VK increases memory usage on Windows compared to the native implementation (e.g. 1GB higher for A Hat In Time). You could try filing an issue with D9VK about memory usage. You would probably want to do comparisons with native Windows before filing it. I completely understand how the need to test on Windows before filing that issue could be a problem.

That said, it would be best to report only problems with 64-bit games here. This issue focuses on the nvidia driver reporting out of memory when there is sufficient address space and system memory. 32-bit games getting out of memory almost always involve running out of address space, which is a different issue entirely.

ryao on Sep 14, 2019

For the people affected, please try it. That would be of great help!

Tk-Glitch on Sep 12, 2019

As the context provided in the changelog entry implies, this fixes a different issue with different symptoms. That was #1169

For the issue here, there has been a patch floating around that we were waiting for feedback on, and we didn’t really get a lot of testing data from end users. It has now been added to our trunk, and will show up in the next release in our Vulkan beta sidebranch, as well as in an unspecified future official release.

ahuillet on Oct 18, 2019

Getting same crash on work computer, with Radeon HD 8570 / R7 240/340 OEM

SteveEbey73742 on Oct 8, 2019

@ryao OK, no problem. Everything you’ve said makes sense. Thanks to you and @SveSop for the assistance, and my apologies for cluttering up the GitHub issue! For the record, the game doesn’t crash at all using WineD3D but naturally performance is worse 😄 so I’ll post on D9VK and see if there’s any luck. Have a great weekend.

alligatorshoes on Sep 14, 2019

I did some experimentation with Unigine Superposition, and a rather… uhm… insane setting.

Superposition settings:
Preset: Custom
API: DirectX
Fullscreen: Enabled
Resolution: 10240x4320

Shaders Quality: Extreme
Textures Quality: High

Vsync: Off
Depth of Field: On
Motion Blur: On

Allocation > 10GB vram (according to DXVK Hud)

Now, doing this ended up in some nice flashbacks from coming home from vacation in the 70’ies… ie. a nasty slideshow.

Other than that, i was only able to cause DXVK memory allocation failure when creating a ramdisk with files filling ram beyond swap while superposition was loading, and tbh using a graphics setting that is 2GB > vram + filling swap to the brim is NOT what is happening when i regularly play. I also started Chrome at the same time this horrible slideshow happened, and that caused something…

NVRM: GPU at PCI:0000:01:00: GPU-49e16335-552c-8128-77c9-81cbe8aed6bf
NVRM: GPU Board Serial Number: 
NVRM: Xid (PCI:0000:01:00): 31, pid=1115, Ch 0000003b, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_RAST faulted @ 0x0_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_WRITE

PID 1115: /usr/bin/nvidia-persistenced --user nvidia-persistenced (The PID usually points to this, but disabling persistenced just made it point to something else when caused, so i dunno if it is just some sort of “pick whatever nVidia process it finds with the lowest PID” kind of thing?)

It did not matter if i used 0, 1 or 2 for /proc/sys/vm/overcommit_memory, other than not being able to create any large files in ramdisk when using “2” (to NOT allow any kind of overcommit). Superposition still loaded with > 10GB vram allocated (as i have 10’ish GB of free sysmem).

Probably not really helpful, but somewhat “proof” that it is as @kakra sais above, probably not related to kernel overcommit_memory setting on the immediate part.

SveSop on Sep 6, 2019

The heuristic of vm.overcommit_memory=0 (default) is simple: It will reject only obvious over-allocations. Since allocation requests from the graphics driver / DXVK are usually small (or kind of smallish, in any case far from obvious over-allocations), this makes no difference. The problem happens only later because memory will not be really allocated upon allocation request but only later lazily when data is written to this memory. If now, on write request, memory cannot be allocated, it’s already too late to deny the allocation. The process will be OOM killed. This does clearly not happen here. So changing this setting is not the root problem and doesn’t solve it.

It also makes no sense how you can see less problems with vm.overcommit_memory=1 because memory availability is never checked but just granted. The graphics stack will eventually fill it fast and suffer an OOM kill. But this doesn’t happen here.

I think NVIDIA already found a memory allocation problem in its Vulkan driver and a fix is in the works: It can fail some allocations when it shouldn’t.

Also, I’m not sure how the driver internals work but the driver probably needs some invariable mappings between sysmem and VRAM so data can be transferred via PCIe bus direct memory access. This memory also probably has to be contiguous. And I think here’s one problem: With huge memory pages enabled, it’s exponentially harder for the kernel to find such regions without defragmenting memory. And since the NVIDIA driver doesn’t support page table mappings and page-fault handling, the kernel also is not able to move (or swap) memory around allocated by the graphics driver: There’s just no interface that could notify the driver or kernel of such events. In contrast, the AMD driver does support this, and thus it can do real overcommits.

But vm.overcommit_memory is different from overcommitting VRAM. That’s two different pairs of shoes (although it’s basically the same idea).

That’s at least why disabling huge pages and closing browser windows helped a lot for me: Chrome is a VRAM memory hog in Xorg (1.5G+ VRAM allocated most of the time). I now have 24G of RAM installed (and also upgraded to a 6G VRAM graphics card) and it became exponentially harder to force this problem show up, even with the browser windows still open and huge pages enabled (but in a less intrusive mode than default). I’m still experimenting with this.

So I suggest to try lowering your memory footprint: stop services, stop programs, drop caches (so there’s a higher chance of having a lot of free contiguous memory) and then try again. I can only conclude that changing the VM overcommit settings for you changes how your system arranges memory, maybe it swaps a little more or less? The settings itself should have absolutely no influence on NVIDIA memory allocation behavior.

If the OOM killer kicks in, it will say so in dmesg.

You could also try Alt+SysRq+F (you may need to enable it in some distributions first, it may be disabled) prior to starting the game: It will trigger the OOM killer and free memory, maybe even kill a process in that effort. Then see if the game still fails. Alternatively: echo f > /proc/sysrq-trigger.

kakra on Sep 6, 2019

As the context provided in the changelog entry implies, this fixes a different issue with different symptoms. That was #1169

There was no context, unless you’re privy to something else. No where was Squad even mentioned let alone the bug.

There are several reports of memory allocations failing. 1169 may be triggered by some other specific code path within the driver but the end result is the same as this one. Memory is not allocated. The crashes likely are a result of where the allocation happened.

When the errors are being reported by the hardware it’s pretty far outside anything end users would understand. What else is NVIDIA expecting? It’s literally impossible to diagnose a binary blob. Maybe it’s a specific bios vendor or some combination of hardware.

h1z1 on Oct 18, 2019

GitArUs on Oct 7, 2019

I have never experienced an kernel OOM killer in these cases. I have never “run out of gpu mem” either, as games like WoW rarely uses more than 2-3GB out of my 8GB, and have never actually been filled.

So there is IMO not a actual case of low memory that is causing this. There have been talks about fragmentation, and that is possibly a valid culprit, as you could have plenty of free memory, just not in a contiguous block - and that triggers a memory allocation problem.

One stupid question I have been trying to ask elsewhere is that the extension KHR_dedicated_allocation that is being used by DXVK seem to be constantly allocating and freeing chunks when used. Could this be causing problems when gaming for a prolonged time? Can it cause fragmentation of sorts?

SveSop on Oct 7, 2019

@SveSop Switching “sync modes” should really not effect how the driver handles memory, or even other stuff. It just changes how the wine source is waiting on events. By default, it does that quite inefficiently with higher latency. “esync” has improved on that. And “fsync” is a step further by moving the waiting group for events right into the kernel. This probably reduces context switches by a big factor. Context switches are normal (and needed) but can be quite bad to the CPU performance (because switching context between threads flushes various caches and registers inside of the CPU).

I’m also not sure why recreating the “.nv” cache has any other side-effects than reduced performance for you. I’ve never seen that problem here.

Both your observations may be a net effect of other problems you’re experiencing. One of those you might have found: Enabling “Above 4G decoding” allows the kernel to use your GPU without allocating bounce buffers (buffers that are bouncing their mapping between below 4G 32-bit address boundary and what the driver expects). With below-4G mapping, it is not possible for your GPU to present its VRAM to the CPU as a whole. It has to constantly swap mappings. Under certain conditions, the kernel may have a hard time finding address space below 4G to map to the CPU. This could well explain why you’re still seeing memory related errors. And extensively bouncing buffers could have yet undetected side-effects in the driver when multiple threads require different mappings at the same time, also constantly changing. This may explain why switching the sync algorithm affects you: Without fsync or esync, it’s much less likely that stuff happens at the same time. But this probably needs fixing at multiple layers, not only the driver.

32 bit games aren’t really affected by above-4G mappings: They only see a 32-bit address space anyways counting from 0, no matter where the 64-bit OS mapped their address space into memory. But the kernel would have a hard time constantly finding address space below 4G if the BIOS doesn’t allow above 4G. 32-bit OS will always allocate from below 4G, it shouldn’t be affected by the setting. This is probably just a setting to fix bugs with drivers or hardware that claim to properly support 64-bit addressing but in reality they don’t. Only in such cases you can and should use below 4G mappings. BIOS settings tend to be at a most compatible and conservative setting by default, even if that means reduced performance. You should change them if you know your hardware could do better (this excludes “overclocking” as a recommendation).

This BIOS setting may come in different flavors, be it “64-bit OS support”, “high memory DMA”, “4G IO limit”… If other users are still affected they may want to check their BIOS settings. @SveSop good find. 😃

kakra on Oct 2, 2019

As for my Elite Dangerous I think I found - that was dxvk cache file. I did system updates and since then game was launching 1 per 10 tries, otherwise all sort of errors. Once I deleted dxvk cache file all went smooth. Also, initial game “shader generation” was as fast without file as with file. I.e. something else caching too (driver?).

I had a similar experience to yours. I renamed the cache folder to force a rebuild of the shader cache and I was able to play for a night without any crashes (though I had to suffer the rebuild like you). Eventually the problem came back though. I’m not an expert in these things, but since my issues seemed memory related perhaps having to rebuild shaders prevented the memory fallback driver bug from emerging as quickly as if everything was built and ready to load into vram.

earldbjr on Sep 20, 2019

I’ve been stalking this thread for some time as it seemed the only place close to providing an answer to the crashes. I’ve been having constant video freezes after 5-30 minutes of play with a GTX 760 2GB on kubuntu using the 430 drivers in Path of Exile and Overwatch. XID errors 69 and 31.

I saw the new beta driver 435.24.02 came out, touting fixes to memory allocation crashes (specifically using system memory as a fallback for full vram) leading to XID 31 and installed it. So far no crashes since.

earldbjr on Sep 19, 2019

@alligatorshoes That is likely out of scope for this issue. I have had similar problems with Company of Heroes. It supports both Direct3D 9 and Direct3D 10, but uses a 32-bit binary. The only thing that I can say is to try using WineD3D. It seems to have somewhat lower address space usage than DXVK and D9VK.

Other than that, higher memory usage than native Windows on 32-bit causing out of memory failures can be considered to be a wine bug from the memory overhead of emulating the NT kernel in userspace. I have heard that Codeweavers might have a solution for this in the future due to the work that they are doing to support 32-bit on future versions of Mac OS X. Perhaps things will be better in the future.

ryao on Sep 14, 2019

@SveSop Thanks so much for the information! I’ll go ahead and do some testing and report back.

alligatorshoes on Sep 14, 2019

@alligatorshoes I have not done this on Arch, but i would assume DKMS works the same. You need to find your driver source (for ubuntu this is /usr/src/nvidia-435.19.03), and you can just edit the /usr/src/nvidia-435.19.03/nvidia/nv-vm.c file directly (remember to use the sudo command), or use the patch. Arch might have sources in a different folder than /usr/src/nvidia-xxx tho.

Kernel-headers, gcc and various compile stuff ofc needed if you have not installed that.

Once that is done, you remove your old kernel module like this: sudo dkms remove nvidia/435.19.03 -k $(uname -r) (This removes your running kernel module) sudo update-initramfs -u (Dunno if this is actually needed… I just tend to do it) sudo dkms autoinstall -k $(uname -r) (This install the kernel module with your modified source) sudo update-initramfs -u (This updates your initramfs with the new kernel module)

Reboot, and you should be using the fix. Good luck.

SveSop on Sep 14, 2019

@pchome

Maybe I’ll try the patch, referenced above, on next kernel update.

Well, after system was in use for a while, and caches was filled in a “natural” way, I did ten Unigine Superposition launches in a row. I didn’t tested whole benchmark, only the fact it launches.

The results (for 2GB VRAM, 8GB RAM, custom Superposition profile: 1080p windowed and high shaders/textures ):

First launch was successful, 3.4GB allocated, 3.0GB used, 6fps avg.
For the next launch and all other successful launches the “allocated” was 3.2GB. Also, I noticed VRAM usage was lower sometimes, near 1.8GB (while ~300MB already used by the system).
Sometimes the benchmark silently failed on the first frame, I guess some kind of setup problem (winelib build, or old Wine prefix, bad command line params, or so).
No single memory allocation error was observed.

p.s. patched 435.19.03

pchome on Sep 12, 2019

I did some experimentation with Unigine Superposition, and a rather… uhm… insane setting.

Go on…

Resolution: 10240x4320

…oh.

Anyway:

NVRM: Xid (PCI:0000:01:00): 31, pid=1115, Ch 0000003b, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_RAST faulted @ 0x0_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_WRITE

This is #1169. Exactly the same Xid & details I get when Squad croaks on me.

PID 1115: /usr/bin/nvidia-persistenced --user nvidia-persistenced (The PID usually points to this, but disabling persistenced just made it point to something else when caused, so i dunno if it is just some sort of “pick whatever nVidia process it finds with the lowest PID” kind of thing?)

This also matches what I see. The PID is either SquadGame.exe or Xorg, but the freeze only happens when playing Squad. Looks like nvidia-driver just pins the fault on whatever process happened to win the bork lottery that time by touching a VRAM address range or something (I’m a moron regarding GPU stuff, please interpret my speculation as a form of poetry).

imaami on Sep 6, 2019

@AlpacaRotorvator You should be able to sudo it with: sudo sync && sudo sh -c "echo '3' >> /proc/sys/vm/drop_caches"

Or just sync && echo 3 | sudo tee /proc/sys/vm/drop_caches.

lesderid on Aug 12, 2019

Regarding transparent huge pages, I’m running without issues since I’ve changed some settings:

#!/bin/sh
echo within_size >/sys/kernel/mm/transparent_hugepage/shmem_enabled
echo always >/sys/kernel/mm/transparent_hugepage/enabled
echo defer+madvise >/sys/kernel/mm/transparent_hugepage/defrag
echo 128 >/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none
#    ^^^ adjust this

One trick seems to be lowering the max_ptes_none (see https://www.kernel.org/doc/Documentation/vm/transhuge.txt), essentially this tells THP how many extra small pages it may allocate from memory to fill a huge page for combining. I.e., a value of 128 allows the kernel to allocate additional 512k of memory to make 1.5M of allocated memory into a full 2M huge page. So it allows wasting up to 25% of memory in my case. The default value seems to be much higher, thus allowing huge amounts of memory to be wasted when THP kicks in. The system will take it from swap instead, in case of the video driver, that’s not possible.

I recommend to completely turn the THP feature off if you’re using less than 16G of memory and running games. For other applications it may still be worth to have it on, especially if most memory is going to be allocated by one or two single services.

If you still want to try, I recommend using max_ptes_none with 128 (with high system memory) or 64 (with low system memory, e.g. below 16G), maybe even smaller values. However, the less memory you allow the kernel to allocate in addition to the existing allocation, the lower the chance of gaining performance by THP. I think the default is somewhat near the full huge page size (maybe even as near as 511 pages). That default is much too high for applications that tend to allocate less than 2M of memory a lot.

If you’re seeing bad swap performance, you may also want to adjust max_ptes_swap (defaults to 64 pages = 256kB): This value allows the kernel to swap back in so many pages to create a huge page.

Also, I’m recommending going with the defer+madvise parameter: This makes THP mostly useless for non-THP-aware applications (unless free memory is not fragmented) but the THP defrag will only kick in for applications that explicitly ask for THP, and only in deferred mode to reduce allocation latency/stalls.

The within_size parameter is interesting for shared memory allocations. There’s a similar parameter for tmpfs that you may want to apply especially on systemd systems (since those mount /tmp with tmpfs). This parameter makes SHM only use THP if at least 2M are allocated. Pulseaudio users may benefit from that for some games.

kakra on Aug 9, 2019

So, does this happen with other vulkan applications or with dxvk only?

ghost on Aug 4, 2019

One thing to note is that 32-bit games are not particularly interesting for this issue since those are more likely to just run out of address space. This especially applies to D9VK since 64-bit D3D9 games don’t really exist.

They might still run out of memory, but there’s a good chance that this has nothing to do with the driver issue being discussed here.

doitsujin on Jul 16, 2019

It is unclear to me what VkDeviceSize MinChunkCount = 16; mean tho, but from reading it directly it CAN be read as “the minimum number of chunks allocated”, and if 128MB is the chunksize, 128x16=2048MB.

It is not unclear if you’d actually look at the code in memory.cpp: If your heap is too small to fit at least 16 chunks, it will fall back to a smaller chunk size instead. This is to provide at least 16 chunk types (because there’s different types of memory allocated from chunks - but it will use only one type of memory allocations per chunk). That’s why I said previously that bigger chunks may reduce the problem: It’s likely that all types of memory will be allocated very early during game initialization, and later on there’s a higher chance of no additional chunk being needed for a specific small allocation type. But it will also increase the chance of failing to allocate a chunk later if it is needed because the system may struggle with finding contiguous memory for it (external vs. internal fragmentation).

If your heap is too small (below 2048 MB), it will instead allocate just smaller chunks (heapSize devided by 16). @doitsujin I wonder: Are there any alignment constraints? Because let’s say we have some uncommon heapsize 1024+512=1536M, and divide that by 16, it would allocate 96M chunks. This is not a power of 2. Does this matter?

kakra on Jul 14, 2019

3b1376b2feba0bed66fd3581766bf8c357a33ecc increases the chunk size to 128 MB (from 64), please test if this changes anything. Here’s a build: dxvk-master.tar.gz

doitsujin on Jul 4, 2019

I had same problem with Fallout4 + Proton 4.2-7 + GTX960. PC freezed each 30-40 min, however problem got fixed after disabling TRANSPARENT_HUGEPAGES. Try put transparent_hugepage=never into linux kernel options (grub.cfg).

Can’t say I’m surprised, I used to use automatic huge page for tmpfs (with huge=within_size) and it frequently led to my PC fully freezing (randomly) when doing things like building software on tmpfs (took me a while to realize it was the problem). That made me lose faith in the thing and I disabled huge pages completely. The idea behind it isn’t bad though, but I’d rather stay away for a while (could be fixed though, I know huge pages are actively being worked on). Transparent huge pages is however a default on a lot of distributions, I’d assume it “usually” works fine, but wine and games perhaps lead to more unusual use-cases.

This doesn’t sound like it’s related to this issue though.

ionenwks on Jun 22, 2019

I had same problem with Fallout4 + Proton 4.2-7 + GTX960. PC freezed each 30-40 min, however problem got fixed after disabling TRANSPARENT_HUGEPAGES. Try put transparent_hugepage=never into linux kernel options (grub.cfg).

7AndreyPetrov on Jun 22, 2019

Since i am an incredibly slow learner, and a n00b… Let me just ask this to TRY to get my head around this “allocated” thing. The Cuda app i posted above “allocates” vram from “actual” vram. If i have 7800MB free vram, i can allocate 7800MB, but if i try to allocate 7900MB i get “Error, could not…” So, when i open eg. firefox, it uses (according to nVidia SMI) 79MB. When i play WoW at my current resolution/settings, the app uses 1880’ish MB. This does not vary much, but may vary with spell effects, and possibly when changing “worlds” (ref. expansions and different texture details and whatnot). Simple math according again to nVidia SMI, 1880 (wow) + 79 (firefox) = 1959mb. This means i can allocate 6GB (well… i could allocate 5960MB with the cuda app).

Reading from DXVK HUD, the “allocation” is 4500+ MB. What is this “allocation”, and is this “unlimited”? Is the allocation limited by vram + system ram? (in my case 8 + 16 = 24GB) From the little tests i have done, it is atleast clear that the “allocated” and “used” listed on dxvk hud does not in any way limit me allocating vram with the cuda app, or starting chrome or whatnot. The only thing that actually spew an error message is if i try to use the cuda app to allocate > available vram.

What i don’t know is supposed to happen with this “dxvk allocation” is what happens if physical vram is full. From the tests it SEEMS as it will happily use system ram (as i guess this is the intended function). The “allocation” and “used” does not change, but WoW (according to nVidia SMI) uses less physical vram if the game is started in a vram starved situation vs. not. What was rather clear tho, is that it can seem as if once any actual data (textures and whatnot) is put in the system ram, it stays there for some reason. The tests with really starved vram makes the GPU usage 99%, and fps… a LOT less even after i kill the cuda app, even if i then get 5GB free physical vram. Would it not be ideal if allocation blocks could be freed or moved to vram once vram is free? Or is that not a feature available to vulkan… or perhaps a driver thing that things dont get “transfered”?

SveSop on Jun 20, 2019

https://github.com/doitsujin/dxvk/commit/138dde6c3d4458a1d262093b93773b6a90090c40 seems an improvement so far.

Doing the same test as above with 7GB memory allocated with “gpufill”, WoW loaded and had a lot higher fps, although some stuttering and framespikes… closing “gpufill” to release 7GB vram brought the frametimes down, and fps up. Fairly playable, but i noticed GPU load was still 90%+ vs if normally where i was standing it usually is 45-50% with 30+ more fps.

~~So for the little testing i did, https://github.com/doitsujin/dxvk/commit/138dde6c3d4458a1d262093b93773b6a90090c40 did help on performance when in a out of vram situation.~~ EDIT: Clearing the .nv/GLCache folder and WoW/retail/Cache folder brought back the same “issues” as https://github.com/doitsujin/dxvk/issues/1100#issuecomment-504068676 it seems…

One other thing i noticed was nVidia-smi seemed to indicate less vram usage from WoW. Is this due to “reuising chunks” so that “actual” vram is not so much?

SveSop on Jun 20, 2019

Well, found a little snippit to allocate ram via CUDA. https://devtalk.nvidia.com/default/topic/726765/need-a-little-tool-to-adjust-the-vram-size/

#include <stdio.h>

int main(int argc, char *argv[])
{
     unsigned long long mem_size = 0;
     void *gpu_mem = NULL;
     cudaError_t err;

     // get amount of memory to allocate in MB, default to 256
     if(argc < 2 || sscanf(argv[1], " %llu", &mem_size) != 1) {
        mem_size = 256;
     }
     mem_size *= 1024*1024;; // convert MB to bytes

     // allocate GPU memory
     err = cudaMalloc(&gpu_mem, mem_size);
     if(err != cudaSuccess) {
        printf("Error, could not allocate %llu bytes.\n", mem_size);
        return 1;
     }

     // wait for a key press
     printf("Press return to exit...\n");
     getchar();

     // free GPU memory and exit
     cudaFree(gpu_mem);
     return 0;
}

Needs cuda-dev-kit from nVidia (or distro). Compile with: nvcc gpufill.cu -o gpufill

That way you can allocate and “spend” vram without actually spending it… What happened if i spend 6GB vram, was that WoW started as normal, and did not crash even tho after running around a bit and zoning++ vram was topped out at 7.9GB+ on my 8GB card. Did not crash, not notice any huge issues, but did not test more than maybe 10-15 minutes.

However, using “gpufill” to load 7GB ram ./gpufill 7000 to spend 7GB vram BEFORE starting WoW, something was clearly taxed to system ram instead, cos the performance was horrible. But i still did not crash from that. Screenshot: WoWmem Closing “gpufill” by pressing enter did release 7GB of vram according to nVidia-smi, but there was no change in WoW performance. This atleast indicates that allocated vram -> system ram does not “transfer” back to actual vram even if its freed later. That may well be intended tho, but from what i gather even this experiment did not immediately crash WoW, so the crashing might not REALLY be actual memory allocation problems due to memory starvation.

The “shared memory” thing between vram<->sysram probably does not work the same way that swap does i guess? Ie. in a memory starving situation things gets put to swap on disk, but once memory gets freed, it does not continue to be used from swap. I have no clue what is supposed to happen in a situation like that tho?

Will do some more testing with this, and with the latest https://github.com/doitsujin/dxvk/commit/138dde6c3d4458a1d262093b93773b6a90090c40

SveSop on Jun 20, 2019