ollama: Ollama does not make use of GPU (T4 on Google Colab)

I was experimenting with serving an Ollama server over ngrok on Google Colab:

%%bash
sudo curl -L https://ollama.ai/download/ollama-linux-amd64 -o /usr/bin/ollama  
sudo chmod +x /usr/bin/ollama

### ngrok codes to expose port 11434 to public URL

ollama serve mistral-openorca

I was able to CURL the server, but I notice that the server does not make use of the notebook GPU.

I’ve also tried installing llama.cpp with CUDA but the GPU remains unused:

%%bash
# Install Server with OpenAI Compatible API - with CUDA GPU support
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip -q install llama-cpp-python[server]

About this issue

Original URL
State: closed
Created 8 months ago
Reactions: 1
Comments: 17

Most upvoted comments

See #758.

CUDA drivers need to be updated in order for Ollama to use GPU in Colab. Update it with this

!sudo apt-get update && sudo apt-get install -y cuda-drivers

mxyng on Oct 19, 2023

Wrong place to discuss this. I suggest you go to Reddit.

tranhoangnguyen03 on Oct 25, 2023

Free accounts are not guaranteed a GPU instance. Recently a lot of people started using Colab to host Stable Diffusion models which often cause shortage of Colab GPU. You should subscribe to Pro if you want consistent GPU availability.

tranhoangnguyen03 on Oct 25, 2023

I tested in google colab T4 but run so slow, maybe it’s not using GPU 👎

missandi on Oct 19, 2023