llama.cpp: Fails with ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

  1. I am using a Google Pixel 6 Pro with vulkan, build with make and clang clang version 17.0.6 Target: aarch64-unknown-linux-android24
  2. I am on 277fad30c60ef3559dc2d01b19d05e659d40a824 b2059

Here is a link for the output of vulkaninfo https://gist.github.com/alex4o/20f949910574295c22f951f64e1d421d here is a link for the output of main https://gist.github.com/alex4o/7809ed6597cb88c4f44fcbab03475d9e

Have not looked too deep in this but it can be seen that llama.cpp tries to allocate a bigger chunk of memory then it needs for some reason.

About this issue

  • Original URL
  • State: closed
  • Created 5 months ago
  • Reactions: 1
  • Comments: 21 (4 by maintainers)

Commits related to this issue

Most upvoted comments

I suppose I should do a check on device creation whether HostVisible, HostCoherent and HostCached memory is available and if not fall back to HostVisible and HostCoherent. HostVisible and HostCached would require me to manually manage the synchronization between CPU and GPU, which currently is not implemented. That’s why you get bad results that way.

I could give it a shot writing a check/fallback for available memory types.

Sure, go ahead. Just be aware of #5321 which touches a lot of the code and will get merged soon. Maybe wait until it’s done or start building on top of it.