aiobotocore: Apparently random NoCredentialsError after running for a while

Describe the bug We have an aiohttp server that sends SQS messages as result of certain actions. After running for a while we’ll get

Traceback (most recent call last):
  File "/usr/local/skyscanner/app/services/sqs.py", line 21, in send_message
    await client.send_message(
  File "/usr/local/lib/python3.10/site-packages/aiobotocore/client.py", line 141, in _make_api_call
    http, parsed_response = await self._make_request(
  File "/usr/local/lib/python3.10/site-packages/aiobotocore/client.py", line 161, in _make_request
    return await self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.10/site-packages/aiobotocore/endpoint.py", line 77, in _send_request
    request = await self.create_request(request_dict, operation_model)
  File "/usr/local/lib/python3.10/site-packages/aiobotocore/endpoint.py", line 70, in create_request
    await self._event_emitter.emit(event_name, request=request,
  File "/usr/local/lib/python3.10/site-packages/aiobotocore/hooks.py", line 27, in _emit
    response = await handler(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/aiobotocore/signers.py", line 16, in handler
    return await self.sign(operation_name, request)
  File "/usr/local/lib/python3.10/site-packages/aiobotocore/signers.py", line 63, in sign
    auth.add_auth(request)
  File "/usr/local/lib/python3.10/site-packages/botocore/auth.py", line 378, in add_auth
    raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials

Our code that triggers the issue in production, where we use IAM roles:

class SQSService:
    def __init__(self, sqs_region: str, sqs_url: str):
        self.default_source = "unknown"
        self.sqs_region = sqs_region
        self.sqs_url = sqs_url

    async def send_message(self, pushed_data: Dict[str, Any], data_type: str, source: Optional[str]):
        try:
            session = get_session()
            async with session.create_client("sqs", region_name=self.sqs_region) as client:
                await client.send_message(
                    QueueUrl=self.sqs_url,
                    MessageBody=json_dumps_extended(
                        {"pushed_data": pushed_data, "data_type": data_type, "source": source or self.default_source}
                    ),
                )        
        except Exception:
            logger.exception(f"Something went wrong in SQS upload of {pushed_data}")

We’ve tried multiple versions including 2.0.0 and 2.5.0

After many many tests trying to find a way to reproduce the issue locally, we’ve managed to mitigate it using backoff. When we do, this is what we get: image

This leads me to believe there’s a run condition somewhere that only triggers after a while running where you might end up with missing credentials temporarily.

Checklist

  • I have reproduced in environment where pip check passes without errors
  • I have provided pip freeze results
  • I have provided sample code or detailed way to reproduce
  • I have tried the same code in botocore to ensure this is an aiobotocore specific issue
  • I have tried similar code in aiohttp to ensure this is is an aiobotocore specific issue
  • I have checked the latest and older versions of aiobotocore/aiohttp/python to see if this is a regression / injection

pip freeze results

$ pip freeze
aiobotocore==2.0.0
aiocache==0.12.0
aiodns==3.0.0
aiohttp==3.8.1
aioitertools==0.11.0
aiosignal==1.3.1
aiotask-context==0.6.1
alembic==1.0.11
async-timeout==4.0.2
asyncpg==0.27.0
asyncpgsa==0.27.1
attrs==23.1.0
backoff==2.2.1
basictracer==3.2.0
black==23.3.0
boto3==1.19.8
botocore==1.22.8
Brotli==1.0.9
build==0.10.0
cachetools==5.3.0
cchardet==2.1.7
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==2.1.1
click==8.1.3
coverage==7.2.3
cryptography==3.4.8
Deprecated==1.2.13
exceptiongroup==1.1.1
flake8==6.0.0
frozenlist==1.3.3
googleapis-common-protos==1.59.0
grpcio==1.53.0
gunicorn==20.1.0
idna==3.4
importlib-metadata==6.4.1
iniconfig==2.0.0
isort==5.12.0
Jinja2==3.1.2
jmespath==0.10.0
jq==1.4.1
jsonpickle==3.0.1
lightstep==4.4.8
Mako==1.2.4
markdown-it-py==2.2.0
MarkupSafe==2.1.2
mccabe==0.7.0
mdurl==0.1.2
moto==4.1.7
multidict==6.0.4
mypy-extensions==1.0.0
object-pool==0.2
opentelemetry-api==1.15.0
opentelemetry-exporter-otlp==1.15.0
opentelemetry-exporter-otlp-proto-grpc==1.15.0
opentelemetry-exporter-otlp-proto-http==1.15.0
opentelemetry-instrumentation==0.36b0
opentelemetry-instrumentation-aiohttp-client==0.36b0
opentelemetry-instrumentation-logging==0.36b0
opentelemetry-opentracing-shim==0.36b0
opentelemetry-propagator-ot-trace==0.36b0
opentelemetry-proto==1.15.0
opentelemetry-sdk==1.15.0
opentelemetry-semantic-conventions==0.36b0
opentelemetry-util-http==0.36b0
opentracing==2.4.0
packaging==23.1
pathspec==0.11.1
pbr==5.11.1
pip-tools==6.13.0
platformdirs==3.2.0
pluggy==1.0.0
pprintpp==0.4.0
protobuf==3.20.3
psycopg2-binary==2.9.6
pycares==4.3.0
pycodestyle==2.10.0
pycparser==2.21
pydantic==1.10.7
pyflakes==3.0.1
pyformance==0.4
Pygments==2.15.0
pyproject_hooks==1.0.0
pytest==7.3.1
pytest-aiohttp==1.0.4
pytest-asyncio==0.21.0
pytest-clarity==1.0.1
pytest-cov==2.12.1
pytest-env==0.6.2
pytest-mock==1.12.1
python-dateutil==2.8.2
python-editor==1.0.4
PyYAML==6.0
requests==2.28.2
responses==0.23.1
rich==13.3.4
s3transfer==0.5.2
six==1.16.0
SQLAlchemy==1.3.24
statsd==3.3.0
thrift==0.16.0
toml==0.10.2
tomli==2.0.1
types-PyYAML==6.0.12.9
types-requests==2.28.11.17
types-urllib3==1.26.25.10
typing_extensions==4.5.0
urllib3==1.26.15
uvloop==0.17.0
Werkzeug==2.2.3
wrapt==1.15.0
xmltodict==0.13.0
yarl==1.8.2
zipp==3.15.0

Environment:

  • Python Version: [e.g. 3.9, 3.10, 3.11]
  • OS name and version: [e.g. linux(python-slim docker)]
  • We haven’t been able to reproduce for 3.9 and aiobotocore==1.2.2

Additional context Happy to provide any further context to help resolve this.

About this issue

Most upvoted comments

This is the kind of pattern I’d love to see documented. If there are certain ways of using the library that minimize load or are generally best practices given how it internally operates we should make this explicit in the docs so people can adopt this patterns.