ollama: Ollama does not make use of GPU (T4 on Google Colab)
I was experimenting with serving an Ollama server over ngrok on Google Colab:
%%bash
sudo curl -L https://ollama.ai/download/ollama-linux-amd64 -o /usr/bin/ollama
sudo chmod +x /usr/bin/ollama
### ngrok codes to expose port 11434 to public URL
ollama serve mistral-openorca
I was able to CURL the server, but I notice that the server does not make use of the notebook GPU.
I’ve also tried installing llama.cpp with CUDA but the GPU remains unused:
%%bash
# Install Server with OpenAI Compatible API - with CUDA GPU support
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip -q install llama-cpp-python[server]
About this issue
- Original URL
- State: closed
- Created 8 months ago
- Reactions: 1
- Comments: 17
See #758.
CUDA drivers need to be updated in order for Ollama to use GPU in Colab. Update it with this
Wrong place to discuss this. I suggest you go to Reddit.
Free accounts are not guaranteed a GPU instance. Recently a lot of people started using Colab to host Stable Diffusion models which often cause shortage of Colab GPU. You should subscribe to Pro if you want consistent GPU availability.
I tested in google colab T4 but run so slow, maybe it’s not using GPU 👎