boto3: Waiter encountered a terminal failure state

When calling wait_until_running() on an instance, sometimes I receive this exception:

2015-07-13 11:44:42,583 INFO call Calling ec2:wait_until_running with {‘InstanceIds’: [‘i-972ed75e’]} 2015-07-13 11:45:43,687 ERROR decorated_function Waiter InstanceRunning failed: Waiter encountered a terminal failure state Traceback (most recent call last): … File “…/lib/python3.4/site-packages/boto3/resources/factory.py”, line 227, in #do_waiter waiter(self, _args, *_kwargs) File “…/lib/python3.4/site-packages/boto3/resources/action.py”, line 194, in #call response = waiter.wait(**params) File “…/lib/python3.4/site-packages/botocore/waiter.py”, line 284, in wait reason=‘Waiter encountered a terminal failure state’) botocore.exceptions.WaiterError: Waiter InstanceRunning failed: Waiter encountered a terminal failure state

In the console, the instance does come into the running state. I have turned on boto3 debug logging but haven’t recreated it again since this happened.

OS X Yosemite 10.10.3 Python 3.4.2 boto3 1.1.0

Edit: I extracted the methods in our custom code to a script that will (hopefully) recreate the issue.

import logging, boto3, time

boto3.set_stream_logger('boto3', logging.DEBUG)
ec2 = boto3.resource('ec2', region_name='us-east-1')
instance = ec2.create_instances(
    ImageId='ami-b0210ed8',
    InstanceType='t2.micro',
    MinCount=1,
    MaxCount=1,
)[0]
print('Created instance:', instance.id)
instance.wait_until_running()
time.sleep(5)
instance.terminate()
instance.wait_until_terminated()
print('Terminated instance:', instance.id)

About this issue

Original URL
State: closed
Created 9 years ago
Comments: 25 (1 by maintainers)

Most upvoted comments

After looking around, I’ve ended up using the waiters directly. Specifically, where ever I’d want to use:

Instance.wait_until_stopped()

I now use:

stopped_instance_waiter = ec2_client.get_waiter('instance_stopped')
stopped_instance_waiter.wait(InstancesIds=[Instance.id])

Yes, it’s an annoying amount of boilerplate, but it doesn’t produce the error above. Maybe the way boto3 implemented the resource level method causes occasional errors.

rirze on Apr 9, 2020

I was hitting this issue and observed the same thing @turtlemonvh saw:

State transition reason Client.VolumeLimitExceeded: Volume limit exceeded

Deleting some unnecessary volumes cleared things up.

It would be great if the Waiter exception could provide something a little more informative. Even if it can’t detect whether a failure was because of a volume limit issue at runtime, a different string that recommends looking at the metadata for the failed instance would have pointed me in the right direction without as much internet searching.

shawnpg on Feb 7, 2017

I was going to write almost exactly this ticket myself except in my case its instance.terminate() that is causing trouble, are you sure this is not the case for you? When instances are being terminated they are first put into stopped state, it appears this was not accounted for in instance.wait_until_terminated() or it was assumed that users would fix it themselves by first using: instance.stop() instance.wait_until_stopped() And then: instance.terminate() instance.wait_until_terminated()

I just think it is weird that instance.terminate() will work on its own on a running instance, but not when used in conjunction with instance.wait_until_terminated()

How to recreate using python:

import boto3
session = boto3.session.Session(aws_access_key_id="",aws_secret_access_key="",region_name='primary')
    resource = session.resource('ec2', endpoint_url="")

instance = resource.create_instances(ImageId=image_id,MinCount=1,MaxCount=1)
instance[0].wait_until_running()
for i in instances:
      i.terminate()
      i.wait_until_terminated()

KlemenzF on Jul 15, 2015

It would be really helpful if you were able to capture the debug logs from when it fails. That would show the response we get back from EC2 so we can see what caused the waiter to fail.

I’ll look into improving the error message we surface. We should be able to add the specific failure state we received to give more context about why it failed.

jamesls on Jul 14, 2015