ollama: Ollama does not make use of GPU (T4 on Google Colab)

I was experimenting with serving an Ollama server over ngrok on Google Colab:

%%bash
sudo curl -L https://ollama.ai/download/ollama-linux-amd64 -o /usr/bin/ollama  
sudo chmod +x /usr/bin/ollama

### ngrok codes to expose port 11434 to public URL

ollama serve mistral-openorca

I was able to CURL the server, but I notice that the server does not make use of the notebook GPU.

I’ve also tried installing llama.cpp with CUDA but the GPU remains unused:

%%bash
# Install Server with OpenAI Compatible API - with CUDA GPU support
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip -q install llama-cpp-python[server]

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Reactions: 1
  • Comments: 17

Most upvoted comments

See #758.

CUDA drivers need to be updated in order for Ollama to use GPU in Colab. Update it with this

!sudo apt-get update && sudo apt-get install -y cuda-drivers

Wrong place to discuss this. I suggest you go to Reddit.

Free accounts are not guaranteed a GPU instance. Recently a lot of people started using Colab to host Stable Diffusion models which often cause shortage of Colab GPU. You should subscribe to Pro if you want consistent GPU availability.

I tested in google colab T4 but run so slow, maybe it’s not using GPU 👎