skypilot: AWS stuck on `Waiting for SSH access`

Hey! I’ve been able to successfully deploy on GCP, but when I try AWS, SkyPilot times out after 600s:

I 10-10 16:33:21 provisioner.py:73] Launching on AWS us-west-2 (us-west-2a,us-west-2b,us-west-2c)
⠋ Launching - Waiting for SSH access
E 10-10 16:43:32 provisioner.py:491] *** Failed setting up cluster. ***
RuntimeError: Failed to SSH to xxx.xxx.xxx after timeout 600s.

Is something wrong with my AWS config? sky check returns AWS: enabled.

About this issue

  • Original URL
  • State: open
  • Created 9 months ago
  • Comments: 17

Most upvoted comments

@Michaelvll On my first run I didn’t have any VPC specified and I didn’t have a ~/.sky/config.yaml file. I guess, skypilot chose a VPC from the list of available VPC on my AWS account. We have several VPCs. I believe, I saw somewhere in the docs, that in this case skypilot should choose a default VPC. In my case, skypilot picked VPC which is not default and it had a subnet without route table attached. As a result, created instance didn’t have access to internet.

To summarize, looks like the problem is a bit on both sides. We have to put our AWS in order, or (prereffably stop using this crap completely) AND skypilot (may be) did something which it wasn’t supposed to, namely, used non-default VPC. May be this is something for you guys to look at.

Again I am happy that we got it resolved. You have a very useful framework, which we will definitely keep using.

Cheers, Nick