llama.cpp: Fails with ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

I am using a Google Pixel 6 Pro with vulkan, build with make and clang clang version 17.0.6 Target: aarch64-unknown-linux-android24
I am on 277fad30c60ef3559dc2d01b19d05e659d40a824 b2059

Here is a link for the output of vulkaninfo https://gist.github.com/alex4o/20f949910574295c22f951f64e1d421d here is a link for the output of main https://gist.github.com/alex4o/7809ed6597cb88c4f44fcbab03475d9e

Have not looked too deep in this but it can be seen that llama.cpp tries to allocate a bigger chunk of memory then it needs for some reason.

About this issue

Original URL
State: closed
Created 5 months ago
Reactions: 1
Comments: 21 (4 by maintainers)

Commits related to this issue

vulkan: Find optimal memory type but with fallback Some memory properties are nice to have, but not critical. `eHostCached`, for instance, isn't essential, and yet we fail on devices where this memor... — committed to luciferous/llama.cpp by luciferous 5 months ago
vulkan: Find optimal memory type but with fallback Some memory properties are nice to have, but not critical. `eHostCached`, for instance, isn't essential, and yet we fail on devices where this memor... — committed to luciferous/llama.cpp by luciferous 5 months ago
vulkan: Find optimal memory type but with fallback Some memory properties are nice to have, but not critical. `eHostCached`, for instance, isn't essential, and yet we fail on devices where this memor... — committed to luciferous/llama.cpp by luciferous 5 months ago

Most upvoted comments

I suppose I should do a check on device creation whether HostVisible, HostCoherent and HostCached memory is available and if not fall back to HostVisible and HostCoherent. HostVisible and HostCached would require me to manually manage the synchronization between CPU and GPU, which currently is not implemented. That’s why you get bad results that way.

0cc4m on Feb 6, 2024

Hi, Please look here https://github.com/ggerganov/llama.cpp/issues/5410

pure-water on Feb 8, 2024

I could give it a shot writing a check/fallback for available memory types.

Sure, go ahead. Just be aware of #5321 which touches a lot of the code and will get merged soon. Maybe wait until it’s done or start building on top of it.

0cc4m on Feb 6, 2024