cml: Can't use AWS Instance GPU on GITLAB CI and CML-RUNNER
I have this gitlab-ci.yml:
stages:
- test
- deploy
- train
sast:
stage: test
include:
- template: Security/SAST.gitlab-ci.yml
deploy_job:
stage: deploy
when: always
image: iterativeai/cml:0-dvc2-base1
script:
- cml-runner
--cloud aws
--cloud-region us-east-1
--cloud-type g3.4xlarge
--cloud-hdd-size 64
--cloud-aws-security-group="cml-runners-sg"
--labels=cml-runner-gpu
--idle-timeout=120
train_job:
stage: train
when: on_success
image: iterativeai/cml:0-dvc2-base1-gpu
tags:
- cml-runner-gpu
before_script:
- pip install poetry
- poetry --version
- poetry config virtualenvs.create false
- poetry install -vv
- nvdia-smi
script:
# DVC Stuff
- dvc pull
- dvc repro -m
- dvc push
# Report metrics
- echo "## Metrics" >> report.md
- echo "\`\`\`json" >> report.md
- cat metrics/best-meta.json >> report.md
- echo "\`\`\`" >> report.md
# Report GPU details
- echo "## GPU info" >> report.md
- cat gpu_info.txt >> report.md
# Send comment
- cml-send-comment report.md
But, the container can’t recognize driver or GPU, on nvidia-smi command I had the following error:
/usr/bin/bash: line 133: nvdia-smi: command not found
I realized that iterativeai/cml:0-dvc2-base1-gpu can’t use instance GPU. How could I install nvidia drivers and the nvidia-docker and activate –gpus option on this docker?
Thank you
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 24 (12 by maintainers)
I see
nvdia-smi
bash line: 125 ? There looks to be typo in your job?If
nvidia-smi
works, these lines won’t run at all.Having seen that
nvidia-smi
works cml should have setup the runner with thenvidia
executor automaticallyhttps://github.com/iterative/cml/blob/e3382668396674d22390d8cfc3403ef1e67dd8eb/src/drivers/gitlab.js#L204
@dacbd I managed to make it work by adding EOF to my pem file:
To be honest I have no idea of how this works, I just imagined it could be that by looking at what you did here: https://github.com/iterative/terraform-provider-iterative/pull/232#issuecomment-952375277
Maybe there is a more elegant way of doing this 😆
That would be amazing! You could create a PR in TPI
O.o’’ @dacbd I thank you so much… I can’t believe we couldn’t see it…
We still can’t make this work, is there any other thing we can try? Or any other information, log etc that we can provide?