gpt4all: AMD GPU Misbehavior w/ some drivers (post GGUF update)

System Info

This is specifically tracking issues that still happen after 2.5.0-pre1 which fixes at least some AMD device/driver combos that were reported broken in https://github.com/nomic-ai/gpt4all/issues/1422 - readd them here if they persist after the GGUF update

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Comments: 42 (4 by maintainers)

Commits related to this issue

Most upvoted comments

FINALLY! it is a synchronization issue. When i define ‘record’ as ‘eval’ at the top of ggml-vulkan.cpp I get correct generation, but of course it is too slow. Now we finally have the right clue!!!

Turning on validation produces a whole bunch of this:

VUID-vkCmdDispatch-groupCountX-00386(ERROR / SPEC): msgNum: -1903005642 - Validation Error: [ VUID-vkCmdDispatch-groupCountX-00386 ] Object 0: handle = 0x1864b7360c0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x8e927036 | vkCmdDispatch(): groupCountX (155520) exceeds device limit maxComputeWorkGroupCount[0] (65535). The Vulkan spec states: groupCountX must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[0] (https://vulkan.lunarg.com/doc/view/1.3.261.1/windows/1.3-extensions/vkspec.html#VUID-vkCmdDispatch-groupCountX-00386)

But not with the “GPT4All Falcon” model. imagen

My driver version is 23.10.2.

Interesting! This is actually observable with our current release 2.5.1 so it would seem this is a different bug. It is not the same as this one because the output is ;;;; and not #### consistently.

Also, the problem with the first characters appears to be a different bug as this occurs with NVIDIA drivers too.

Closing this bug as fixed and opening two new ones for ^^^

https://output.circle-artifacts.com/output/job/b7ff15c3-377d-4d27-9dc0-c6503ec5a2b0/artifacts/0/build/upload/gpt4all-installer-win64.exe

Heree is a new offline installer that people can test to see if the recent bugfix resolves the issue. Please let me know!

It seems like the GPU is not being used at all. Is it supported at all?

It should be supported. Is it available to select in the UI, and does it report use of the device in the bottom-right while generating output? #1425 is for unsupported GPUs. If you see the hashes, your GPU is being used, but running into a GPT4All bug.

I am able to select the GPU in the list, but it’s not being used and reports not enough VRAM when VRAM is actually not being used at all.

Gibberish on using Mistral with the Vulkan backend. I’m using a 6800M with Adrenalin 23.10.2 driver set. Not surprising since 6700 and this are the same ISA

Also a really bad question to the other folks here, do you also get a selection box like this: image

or when you do vulkaninfo, do you get multiple devices? Note that this is a laptop with a gfx90c integrated (A)GPU and a discrete gfx1031 GPU:

vulkaninfo.exe --summary
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
==========
VULKANINFO
==========

Vulkan Instance Version: 1.3.261


Instance Extensions: count = 13
-------------------------------
VK_EXT_debug_report                    : extension revision 10
VK_EXT_debug_utils                     : extension revision 2
VK_EXT_swapchain_colorspace            : extension revision 4
VK_KHR_device_group_creation           : extension revision 1
VK_KHR_external_fence_capabilities     : extension revision 1
VK_KHR_external_memory_capabilities    : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 2
VK_KHR_get_surface_capabilities2       : extension revision 1
VK_KHR_portability_enumeration         : extension revision 1
VK_KHR_surface                         : extension revision 25
VK_KHR_win32_surface                   : extension revision 6
VK_LUNARG_direct_driver_loading        : extension revision 1

Instance Layers: count = 4
--------------------------
VK_LAYER_AMD_switchable_graphics AMD switchable graphics layer 1.3.262  version 1
VK_LAYER_AMD_switchable_graphics AMD switchable graphics layer 1.3.260  version 1
VK_LAYER_VALVE_steam_fossilize   Steam Pipeline Caching Layer  1.3.207  version 1
VK_LAYER_VALVE_steam_overlay     Steam Overlay Layer           1.3.207  version 1

Devices:
========
GPU0:
        apiVersion         = 1.3.260
        driverVersion      = 2.0.279
        vendorID           = 0x1002
        deviceID           = 0x1638
        deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
        deviceName         = AMD Radeon(TM) Graphics
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 23.9.2 (AMD proprietary shader compiler)
        conformanceVersion = 1.3.3.1
        deviceUUID         = 00000000-0700-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000
GPU1:
        apiVersion         = 1.3.262
        driverVersion      = 2.0.283
        vendorID           = 0x1002
        deviceID           = 0x1638
        deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
        deviceName         = AMD Radeon(TM) Graphics
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 23.9.2 (AMD proprietary shader compiler)
        conformanceVersion = 1.3.3.1
        deviceUUID         = 00000000-0700-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000
GPU2:
        apiVersion         = 1.3.260
        driverVersion      = 2.0.279
        vendorID           = 0x1002
        deviceID           = 0x73df
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = AMD Radeon RX 6800M
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 23.10.2 (AMD proprietary shader compiler)
        conformanceVersion = 1.3.3.1
        deviceUUID         = 00000000-0300-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000
GPU3:
        apiVersion         = 1.3.262
        driverVersion      = 2.0.283
        vendorID           = 0x1002
        deviceID           = 0x73df
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = AMD Radeon RX 6800M
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 23.10.2 (AMD proprietary shader compiler)
        conformanceVersion = 1.3.3.1
        deviceUUID         = 00000000-0300-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000

I can confirm in my case the GPU is definitely being used. By watching task manager I see notable vram and GPU usage when testing a compatible model.

Also, even though the output is gibberish, it is generating much faster on the GPU, like 10x faster.

We received another report of this issue from Kongming on Discord with GPT4All v2.5.1:

  • RX 6600 XT / Windows 11 / Driver Version 23.10.24.03-230824a-395232C-AIB

imagen AMD RX 7900XT, driver 23.10.1