gpt4all: AMD GPU Misbehavior w/ some drivers (post GGUF update)
System Info
This is specifically tracking issues that still happen after 2.5.0-pre1 which fixes at least some AMD device/driver combos that were reported broken in https://github.com/nomic-ai/gpt4all/issues/1422 - readd them here if they persist after the GGUF update
- Repeated same token
######
reported with 2.5.0-pre1 on- Radeon RX 6600 XT / Windows / driver 2.0.270 - https://vulkan.gpuinfo.org/displayreport.php?id=24844#device
- Radeon RX 6800 XT / Linux (AMDVLK) version ?? - same device reportedly works correctly with the RADV driver
- Radeon RX 6900 XT / Windows https://github.com/nomic-ai/gpt4all/issues/1437#issuecomment-1758337521
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Comments: 42 (4 by maintainers)
FINALLY! it is a synchronization issue. When i define ‘record’ as ‘eval’ at the top of ggml-vulkan.cpp I get correct generation, but of course it is too slow. Now we finally have the right clue!!!
Turning on validation produces a whole bunch of this:
VUID-vkCmdDispatch-groupCountX-00386(ERROR / SPEC): msgNum: -1903005642 - Validation Error: [ VUID-vkCmdDispatch-groupCountX-00386 ] Object 0: handle = 0x1864b7360c0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x8e927036 | vkCmdDispatch(): groupCountX (155520) exceeds device limit maxComputeWorkGroupCount[0] (65535). The Vulkan spec states: groupCountX must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[0] (https://vulkan.lunarg.com/doc/view/1.3.261.1/windows/1.3-extensions/vkspec.html#VUID-vkCmdDispatch-groupCountX-00386)
Interesting! This is actually observable with our current release 2.5.1 so it would seem this is a different bug. It is not the same as this one because the output is ;;;; and not #### consistently.
Also, the problem with the first characters appears to be a different bug as this occurs with NVIDIA drivers too.
Closing this bug as fixed and opening two new ones for ^^^
https://output.circle-artifacts.com/output/job/b7ff15c3-377d-4d27-9dc0-c6503ec5a2b0/artifacts/0/build/upload/gpt4all-installer-win64.exe
Heree is a new offline installer that people can test to see if the recent bugfix resolves the issue. Please let me know!
I am able to select the GPU in the list, but it’s not being used and reports not enough VRAM when VRAM is actually not being used at all.
Gibberish on using Mistral with the Vulkan backend. I’m using a 6800M with Adrenalin 23.10.2 driver set. Not surprising since 6700 and this are the same ISA
Also a really bad question to the other folks here, do you also get a selection box like this:
or when you do vulkaninfo, do you get multiple devices? Note that this is a laptop with a gfx90c integrated (A)GPU and a discrete gfx1031 GPU:
I can confirm in my case the GPU is definitely being used. By watching task manager I see notable vram and GPU usage when testing a compatible model.
Also, even though the output is gibberish, it is generating much faster on the GPU, like 10x faster.
We received another report of this issue from Kongming on Discord with GPT4All v2.5.1: