llama.cpp: Fails with ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
- I am using a Google Pixel 6 Pro with vulkan, build with make and clang
clang version 17.0.6 Target: aarch64-unknown-linux-android24
- I am on 277fad30c60ef3559dc2d01b19d05e659d40a824
b2059
Here is a link for the output of vulkaninfo
https://gist.github.com/alex4o/20f949910574295c22f951f64e1d421d
here is a link for the output of main
https://gist.github.com/alex4o/7809ed6597cb88c4f44fcbab03475d9e
Have not looked too deep in this but it can be seen that llama.cpp tries to allocate a bigger chunk of memory then it needs for some reason.
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Reactions: 1
- Comments: 21 (4 by maintainers)
Commits related to this issue
- vulkan: Find optimal memory type but with fallback Some memory properties are nice to have, but not critical. `eHostCached`, for instance, isn't essential, and yet we fail on devices where this memor... — committed to luciferous/llama.cpp by luciferous 5 months ago
- vulkan: Find optimal memory type but with fallback Some memory properties are nice to have, but not critical. `eHostCached`, for instance, isn't essential, and yet we fail on devices where this memor... — committed to luciferous/llama.cpp by luciferous 5 months ago
- vulkan: Find optimal memory type but with fallback Some memory properties are nice to have, but not critical. `eHostCached`, for instance, isn't essential, and yet we fail on devices where this memor... — committed to luciferous/llama.cpp by luciferous 5 months ago
I suppose I should do a check on device creation whether HostVisible, HostCoherent and HostCached memory is available and if not fall back to HostVisible and HostCoherent. HostVisible and HostCached would require me to manually manage the synchronization between CPU and GPU, which currently is not implemented. That’s why you get bad results that way.
Hi, Please look here https://github.com/ggerganov/llama.cpp/issues/5410
Sure, go ahead. Just be aware of #5321 which touches a lot of the code and will get merged soon. Maybe wait until it’s done or start building on top of it.