gpt4all: AMD GPU Misbehavior w/ some drivers (post GGUF update)

System Info

This is specifically tracking issues that still happen after 2.5.0-pre1 which fixes at least some AMD device/driver combos that were reported broken in https://github.com/nomic-ai/gpt4all/issues/1422 - readd them here if they persist after the GGUF update

Repeated same token ###### reported with 2.5.0-pre1 on
- Radeon RX 6600 XT / Windows / driver 2.0.270 - https://vulkan.gpuinfo.org/displayreport.php?id=24844#device
- Radeon RX 6800 XT / Linux (AMDVLK) version ?? - same device reportedly works correctly with the RADV driver
- Radeon RX 6900 XT / Windows https://github.com/nomic-ai/gpt4all/issues/1437#issuecomment-1758337521

About this issue

Original URL
State: closed
Created 9 months ago
Comments: 42 (4 by maintainers)

Commits related to this issue

Fix synchronization problem for AMD Radeon with amdvlk driver or windows drivers. Does not have any performance or fidelity effect on other gpu/driver combos I've tested. FIXES: https://github.com/no... — committed to cebtenzzre/llama.cpp by manyoso 8 months ago

Most upvoted comments

FINALLY! it is a synchronization issue. When i define ‘record’ as ‘eval’ at the top of ggml-vulkan.cpp I get correct generation, but of course it is too slow. Now we finally have the right clue!!!

manyoso on Oct 27, 2023

Turning on validation produces a whole bunch of this:

VUID-vkCmdDispatch-groupCountX-00386(ERROR / SPEC): msgNum: -1903005642 - Validation Error: [ VUID-vkCmdDispatch-groupCountX-00386 ] Object 0: handle = 0x1864b7360c0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x8e927036 | vkCmdDispatch(): groupCountX (155520) exceeds device limit maxComputeWorkGroupCount[0] (65535). The Vulkan spec states: groupCountX must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[0] (https://vulkan.lunarg.com/doc/view/1.3.261.1/windows/1.3-extensions/vkspec.html#VUID-vkCmdDispatch-groupCountX-00386)

manyoso on Oct 26, 2023

But not with the “GPT4All Falcon” model.

My driver version is 23.10.2.

Interesting! This is actually observable with our current release 2.5.1 so it would seem this is a different bug. It is not the same as this one because the output is ;;;; and not #### consistently.

Also, the problem with the first characters appears to be a different bug as this occurs with NVIDIA drivers too.

Closing this bug as fixed and opening two new ones for ^^^

manyoso on Oct 27, 2023

https://output.circle-artifacts.com/output/job/b7ff15c3-377d-4d27-9dc0-c6503ec5a2b0/artifacts/0/build/upload/gpt4all-installer-win64.exe

Heree is a new offline installer that people can test to see if the recent bugfix resolves the issue. Please let me know!

manyoso on Oct 27, 2023

It seems like the GPU is not being used at all. Is it supported at all?

It should be supported. Is it available to select in the UI, and does it report use of the device in the bottom-right while generating output? #1425 is for unsupported GPUs. If you see the hashes, your GPU is being used, but running into a GPT4All bug.

I am able to select the GPU in the list, but it’s not being used and reports not enough VRAM when VRAM is actually not being used at all.

mrdevolver on Oct 31, 2023

Gibberish on using Mistral with the Vulkan backend. I’m using a 6800M with Adrenalin 23.10.2 driver set. Not surprising since 6700 and this are the same ISA

Also a really bad question to the other folks here, do you also get a selection box like this:

or when you do vulkaninfo, do you get multiple devices? Note that this is a laptop with a gfx90c integrated (A)GPU and a discrete gfx1031 GPU:

vulkaninfo.exe --summary
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
==========
VULKANINFO
==========

Vulkan Instance Version: 1.3.261


Instance Extensions: count = 13
-------------------------------
VK_EXT_debug_report                    : extension revision 10
VK_EXT_debug_utils                     : extension revision 2
VK_EXT_swapchain_colorspace            : extension revision 4
VK_KHR_device_group_creation           : extension revision 1
VK_KHR_external_fence_capabilities     : extension revision 1
VK_KHR_external_memory_capabilities    : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 2
VK_KHR_get_surface_capabilities2       : extension revision 1
VK_KHR_portability_enumeration         : extension revision 1
VK_KHR_surface                         : extension revision 25
VK_KHR_win32_surface                   : extension revision 6
VK_LUNARG_direct_driver_loading        : extension revision 1

Instance Layers: count = 4
--------------------------
VK_LAYER_AMD_switchable_graphics AMD switchable graphics layer 1.3.262  version 1
VK_LAYER_AMD_switchable_graphics AMD switchable graphics layer 1.3.260  version 1
VK_LAYER_VALVE_steam_fossilize   Steam Pipeline Caching Layer  1.3.207  version 1
VK_LAYER_VALVE_steam_overlay     Steam Overlay Layer           1.3.207  version 1

Devices:
========
GPU0:
        apiVersion         = 1.3.260
        driverVersion      = 2.0.279
        vendorID           = 0x1002
        deviceID           = 0x1638
        deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
        deviceName         = AMD Radeon(TM) Graphics
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 23.9.2 (AMD proprietary shader compiler)
        conformanceVersion = 1.3.3.1
        deviceUUID         = 00000000-0700-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000
GPU1:
        apiVersion         = 1.3.262
        driverVersion      = 2.0.283
        vendorID           = 0x1002
        deviceID           = 0x1638
        deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
        deviceName         = AMD Radeon(TM) Graphics
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 23.9.2 (AMD proprietary shader compiler)
        conformanceVersion = 1.3.3.1
        deviceUUID         = 00000000-0700-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000
GPU2:
        apiVersion         = 1.3.260
        driverVersion      = 2.0.279
        vendorID           = 0x1002
        deviceID           = 0x73df
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = AMD Radeon RX 6800M
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 23.10.2 (AMD proprietary shader compiler)
        conformanceVersion = 1.3.3.1
        deviceUUID         = 00000000-0300-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000
GPU3:
        apiVersion         = 1.3.262
        driverVersion      = 2.0.283
        vendorID           = 0x1002
        deviceID           = 0x73df
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = AMD Radeon RX 6800M
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 23.10.2 (AMD proprietary shader compiler)
        conformanceVersion = 1.3.3.1
        deviceUUID         = 00000000-0300-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000

harish0201 on Oct 26, 2023

I can confirm in my case the GPU is definitely being used. By watching task manager I see notable vram and GPU usage when testing a compatible model.

Also, even though the output is gibberish, it is generating much faster on the GPU, like 10x faster.

Dleewee on Oct 25, 2023

We received another report of this issue from Kongming on Discord with GPT4All v2.5.1:

RX 6600 XT / Windows 11 / Driver Version 23.10.24.03-230824a-395232C-AIB

cebtenzzre on Oct 24, 2023

imagen AMD RX 7900XT, driver 23.10.1

mau777pirho on Oct 13, 2023