spotty: GCP: 30 minutes for runtimeconfig.v1beta1.waiter Timeout expired
On GCP, I am using a spotty.yaml that previously worked, but does not currently. I suspect that the volume is large (2 TB) and some sort of timeout is happening.
It takes about 31 minutes when I run spotty start and I get the following error:
Waiting for the stack to be created...
- launching the instance...
- running the Docker container...
Error:
------
Deployment "spotty-instance-hearpreprocess-hearpreprocess-i2-joseph" failed.
Error: {"ResourceType":"runtimeconfig.v1beta1.waiter","ResourceErrorCode":"504","ResourceErrorMessage":"Timeout expired."}
Here is my config:
project:
name: hearpreprocess
syncFilters:
- exclude:
- '*/__pycache__/*'
- .git/*
- .idea/*
- .mypy_cache/*
- _workdir/*
- hear-2021*.tar.gz
- hear-2021*/*
- hearpreprocess.egg-info/*
- tasks/*
containers:
- projectDir: /workspace/project
image: turian/hearpreprocess
volumeMounts:
- name: workspace
mountPath: /workspace
runtimeParameters: ['--shm-size', '20G']
instances:
- name: hearpreprocess-i2-joseph
provider: gcp
parameters:
zone: europe-west4-a
machineType: n1-standard-8
preemptibleInstance: False
gpu:
type: nvidia-tesla-v100
count: 1
imageUri: projects/ml-images/global/images/c0-deeplearning-common-cu110-v20210818-debian-10
volumes:
- name: workspace
parameters:
size: 2000
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19
Well, using the
docker ps -acommand I checked that the container exits on its own with the 137 error which usually means OOM. At first, I suspected that it’s a heavy Docker image that requires a lot of memory, but then I tried thetensorflow/tensorflowimage instead - it gave me the same error. At that point, it was clear that it’s not OOM. Then I tried to google what else it could be and found this issue where people had the same problem a couple of years ago. They solved it by updatingcontainerdto the latest version, so I tried a newer version as well and it worked 😃.Glad it worked for you, I’m closing the issue then. Feel free to reopen if it happens again.