skypilot: [Serve] GCP crendential path error with docker image and replica
Can you help me to launch sky serve auto scaling with docker image?
launch command like below:
sky serve up -n {service name} --env-file {env file path} service.yaml
servcie.yaml like below:
# service.yaml
service:
readiness_probe: /health
replica_policy:
min_replicas: 1
max_replicas: 4
target_qps_per_replica: 3
upscale_delay_seconds: 180
downscale_delay_seconds: 900
# Fields below describe each replica.
resources:
cloud: GCP
ports: 8000
accelerators: L4
workdir: .
setup: docker login -u ${DOCKER_ID} -p ${DOCKER_PW} {docker image repository}
run: docker run -v ~/models/:/usr/app/models -p 8000:8000 -e ENV=prod --runtime=nvidia --gpus all {docker image path}
Error occurs with replica provisioned. maybe gcp credential not exist error.
I 03-18 05:34:02 replica_managers.py:118] Failed to launch the sky serve replica cluster with error: subprocess.CalledProcessError: Command 'pushd /tmp &>/dev/null && { gcloud --help > /dev/null 2>&1 || { mkdir -p ~/.sky/logs && wget --quiet https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-424.0.0-linux-x86_64.tar.gz > ~/.sky/logs/gcloud_installation.log && tar xzf google-cloud-sdk-424.0.0-linux-x86_64.tar.gz >> ~/.sky/logs/gcloud_installation.log && rm -rf ~/google-cloud-sdk >> ~/.sky/logs/gcloud_installation.log && mv google-cloud-sdk ~/ && ~/google-cloud-sdk/install.sh -q >> ~/.sky/logs/gcloud_installation.log 2>&1 && echo "source ~/google-cloud-sdk/path.bash.inc > /dev/null 2>&1" >> ~/.bashrc && source ~/google-cloud-sdk/path.bash.inc >> ~/.sky/logs/gcloud_installation.log 2>&1; }; } && popd &>/dev/null && [[ "$(uname)" == "Darwin" ]] && skypilot_gsutil() { gsutil -m -o "GSUtil:parallel_process_count=1" "$@"; } || skypilot_gsutil() { gsutil -m "$@"; }; GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/application_default_credentials.json skypilot_gsutil ls -d gs://skypilot-workdir-namsangho-a20c2158' returned non-zero exit status 1.)
I 03-18 05:34:02 replica_managers.py:121] Traceback: Traceback (most recent call last):
I 03-18 05:34:02 replica_managers.py:121] File "/opt/conda/lib/python3.10/site-packages/sky/serve/replica_managers.py", line 95, in launch_cluster
I 03-18 05:34:02 replica_managers.py:121] sky.launch(task,
I 03-18 05:34:02 replica_managers.py:121] File "/opt/conda/lib/python3.10/site-packages/sky/utils/common_utils.py", line 370, in _record
I 03-18 05:34:02 replica_managers.py:121] return f(*args, **kwargs)
I 03-18 05:34:02 replica_managers.py:121] File "/opt/conda/lib/python3.10/site-packages/sky/utils/common_utils.py", line 370, in _record
I 03-18 05:34:02 replica_managers.py:121] return f(*args, **kwargs)
I 03-18 05:34:02 replica_managers.py:121] File "/opt/conda/lib/python3.10/site-packages/sky/execution.py", line 501, in launch
I 03-18 05:34:02 replica_managers.py:121] return _execute(
I 03-18 05:34:02 replica_managers.py:121] File "/opt/conda/lib/python3.10/site-packages/sky/execution.py", line 334, in _execute
I 03-18 05:34:02 replica_managers.py:121] backend.sync_file_mounts(handle, task.file_mounts,
I 03-18 05:34:02 replica_managers.py:121] File "/opt/conda/lib/python3.10/site-packages/sky/utils/common_utils.py", line 370, in _record
I 03-18 05:34:02 replica_managers.py:121] return f(*args, **kwargs)
I 03-18 05:34:02 replica_managers.py:121] File "/opt/conda/lib/python3.10/site-packages/sky/utils/common_utils.py", line 349, in _record
I 03-18 05:34:02 replica_managers.py:121] return f(*args, **kwargs)
I 03-18 05:34:02 replica_managers.py:121] File "/opt/conda/lib/python3.10/site-packages/sky/backends/backend.py", line 73, in sync_file_mounts
I 03-18 05:34:02 replica_managers.py:121] return self._sync_file_mounts(handle, all_file_mounts, storage_mounts)
I 03-18 05:34:02 replica_managers.py:121] File "/opt/conda/lib/python3.10/site-packages/sky/backends/cloud_vm_ray_backend.py", line 2990, in _sync_file_mounts
I 03-18 05:34:02 replica_managers.py:121] self._execute_file_mounts(handle, all_file_mounts)
I 03-18 05:34:02 replica_managers.py:121] File "/opt/conda/lib/python3.10/site-packages/sky/backends/cloud_vm_ray_backend.py", line 4341, in _execute_file_mounts
I 03-18 05:34:02 replica_managers.py:121] if storage.is_directory(src):
I 03-18 05:34:02 replica_managers.py:121] File "/opt/conda/lib/python3.10/site-packages/sky/cloud_stores.py", line 116, in is_directory
I 03-18 05:34:02 replica_managers.py:121] p = subprocess.run(command,
I 03-18 05:34:02 replica_managers.py:121] File "/opt/conda/lib/python3.10/subprocess.py", line 526, in run
I 03-18 05:34:02 replica_managers.py:121] raise CalledProcessError(retcode, process.args,
I 03-18 05:34:02 replica_managers.py:121] subprocess.CalledProcessError: Command 'pushd /tmp &>/dev/null && { gcloud --help > /dev/null 2>&1 || { mkdir -p ~/.sky/logs && wget --quiet https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-424.0.0-linux-x86_64.tar.gz > ~/.sky/logs/gcloud_installation.log && tar xzf google-cloud-sdk-424.0.0-linux-x86_64.tar.gz >> ~/.sky/logs/gcloud_installation.log && rm -rf ~/google-cloud-sdk >> ~/.sky/logs/gcloud_installation.log && mv google-cloud-sdk ~/ && ~/google-cloud-sdk/install.sh -q >> ~/.sky/logs/gcloud_installation.log 2>&1 && echo "source ~/google-cloud-sdk/path.bash.inc > /dev/null 2>&1" >> ~/.bashrc && source ~/google-cloud-sdk/path.bash.inc >> ~/.sky/logs/gcloud_installation.log 2>&1; }; } && popd &>/dev/null && [[ "$(uname)" == "Darwin" ]] && skypilot_gsutil() { gsutil -m -o "GSUtil:parallel_process_count=1" "$@"; } || skypilot_gsutil() { gsutil -m "$@"; }; GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/application_default_credentials.json skypilot_gsutil ls -d gs://skypilot-workdir-namsangho-a20c2158' returned non-zero exit status 1.
About this issue
- Original URL
- State: open
- Created 3 months ago
- Comments: 16
Hi @sean-styleai ! Thanks for reporting the issue. Could you try to directly
sky launchthis YAML and to see if the error persists? Also, could you share the output ofsky statusin your local laptop (for more information on SkyServe Controller spec)?cc @cblmemo