aws-sdk-pandas: Error when sending DataFrame to S3 using a boto3 Session

Hi.

I am trying to send a DataFrame to S3 by using a previously created boto3 session and get the following error:

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 908, in _data_to_s3_dataset_writer_remote
    isolated_dataframe=True))
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 853, in _data_to_s3_dataset_writer
    isolated_dataframe=isolated_dataframe)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 962, in _data_to_s3_object_writer
    extra_args=extra_args)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 999, in _write_csv_dataframe
    Pandas._write_csv_to_s3_retrying(fs=fs, path=path, buffer=csv_buffer)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 241, in wrapped_f
    return self.call(f, *args, **kw)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 330, in call
    start_time=start_time)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 279, in iter
    return fut.result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 333, in call
    result = fn(*args, **kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 1009, in _write_csv_to_s3_retrying
    f.write(buffer)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1245, in __exit__
    self.close()
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1213, in close
    self.flush(force=True)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1085, in flush
    self._initiate_upload()
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 1002, in _initiate_upload
    Bucket=self.bucket, Key=self.key, ACL=self.acl)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 991, in _call_s3
    **kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 184, in _call_s3
    return method(**additional_kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 276, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 573, in _make_api_call
    operation_model, request_dict, request_context)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 592, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 132, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 116, in create_request
    operation_name=operation_model.name)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/signers.py", line 160, in sign
    auth.add_auth(request)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/auth.py", line 357, in add_auth
    raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 908, in _data_to_s3_dataset_writer_remote
    isolated_dataframe=True))
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 853, in _data_to_s3_dataset_writer
    isolated_dataframe=isolated_dataframe)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 962, in _data_to_s3_object_writer
    extra_args=extra_args)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 999, in _write_csv_dataframe
    Pandas._write_csv_to_s3_retrying(fs=fs, path=path, buffer=csv_buffer)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 241, in wrapped_f
    return self.call(f, *args, **kw)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 330, in call
    start_time=start_time)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 279, in iter
    return fut.result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 333, in call
    result = fn(*args, **kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 1009, in _write_csv_to_s3_retrying
    f.write(buffer)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1245, in __exit__
    self.close()
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1213, in close
    self.flush(force=True)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1085, in flush
    self._initiate_upload()
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 1002, in _initiate_upload
    Bucket=self.bucket, Key=self.key, ACL=self.acl)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 991, in _call_s3
    **kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 184, in _call_s3
    return method(**additional_kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 276, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 573, in _make_api_call
    operation_model, request_dict, request_context)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 592, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 132, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 116, in create_request
    operation_name=operation_model.name)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/signers.py", line 160, in sign
    auth.add_auth(request)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/auth.py", line 357, in add_auth
    raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Traceback (most recent call last):
  File "awswrangler_test.py", line 14, in <module>
    wr_session.pandas.to_csv(df, "s3://blu-datalake/test/foobar")
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 592, in to_csv
    columns_comments=columns_comments)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 729, in to_s3
    extra_args=extra_args)
  File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 805, in data_to_s3
    objects_paths += receive_pipes[i].recv()
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

I believe it i trying to get the credentials from the default profile in ~/.aws/credentials.

Here is an example that replicate the error:

import pandas as pd
import awswrangler as wr
import boto3

boto3_session = boto3.Session(aws_access_key_id="****", aws_secret_access_key="****",)

wr_session = wr.Session(boto3_session=boto3_session)

df = pd.DataFrame(dict(a=[1, 2], b=[3, 4]))

wr_session.pandas.to_csv(df, "s3://foo/bar")

Is this the right way to use a boto3 Session?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (9 by maintainers)

Most upvoted comments

@gabraganca I could reproduce the first two stack traces and then we decided to keep the focus on the version 1.0.0. We will have specific test cases to cover all this kind of situation related to boto3 session.

Thanks @gabraganca, I will keep trying to figure out.

@igorborgest Oh, I see. I didn’t know that was possible. That makes everything easier.

I’ll try that and return back to you.

I just tried the following code and got the same error:

import pandas as pd
import awswrangler as wr
import boto3

boto3_session = boto3.Session(
    aws_access_key_id="...",
    aws_secret_access_key="...",
    region_name="us-east-1",
)

wr_session = wr.Session(boto3_session=boto3_session)

df = pd.DataFrame(dict(a=[1, 2], b=[3, 4]))

wr_session.pandas.to_csv(df, "s3://foo/bar", procs_cpu_bound=1)

Traceback (most recent call last): File “awswrangler_test.py”, line 15, in <module> wr_session.pandas.to_csv(df, “s3://blu-datalake/test/foobar”, procs_cpu_bound=1) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py”, line 592, in to_csv columns_comments=columns_comments) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py”, line 729, in to_s3 extra_args=extra_args) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py”, line 818, in data_to_s3 isolated_dataframe=isolated_dataframe) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py”, line 853, in _data_to_s3_dataset_writer isolated_dataframe=isolated_dataframe) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py”, line 962, in _data_to_s3_object_writer extra_args=extra_args) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py”, line 999, in _write_csv_dataframe Pandas._write_csv_to_s3_retrying(fs=fs, path=path, buffer=csv_buffer) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/init.py”, line 241, in wrapped_f return self.call(f, *args, **kw) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/init.py”, line 330, in call start_time=start_time) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/init.py”, line 279, in iter return fut.result() File “/usr/lib/python3.6/concurrent/futures/_base.py”, line 425, in result return self.__get_result() File “/usr/lib/python3.6/concurrent/futures/_base.py”, line 384, in __get_result raise self._exception File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/init.py”, line 333, in call result = fn(*args, **kwargs) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py”, line 1009, in _write_csv_to_s3_retrying f.write(buffer) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py”, line 1245, in exit self.close() File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py”, line 1213, in close self.flush(force=True) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py”, line 1085, in flush self._initiate_upload() File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py”, line 1002, in _initiate_upload Bucket=self.bucket, Key=self.key, ACL=self.acl) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py”, line 991, in _call_s3 **kwargs) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py”, line 184, in _call_s3 return method(**additional_kwargs) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py”, line 316, in _api_call return self._make_api_call(operation_name, kwargs) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py”, line 613, in _make_api_call operation_model, request_dict, request_context) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py”, line 632, in _make_request return self._endpoint.make_request(operation_model, request_dict) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py”, line 102, in make_request return self._send_request(request_dict, operation_model) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py”, line 132, in _send_request request = self.create_request(request_dict, operation_model) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py”, line 116, in create_request operation_name=operation_model.name) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py”, line 356, in emit return self._emitter.emit(aliased_event_name, **kwargs) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py”, line 228, in emit return self._emit(event_name, kwargs) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py”, line 211, in _emit response = handler(**kwargs) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/signers.py”, line 90, in handler return self.sign(operation_name, request) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/signers.py”, line 160, in sign auth.add_auth(request) File “/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/auth.py”, line 357, in add_auth raise NoCredentialsError botocore.exceptions.NoCredentialsError: Unable to locate credentials

Great, @igorborgest . Actually, I found this bug when running to_parquet, but the to_csv method returns the same error.

Hi @gabraganca, now it really seems like a bug. 🐛

I will start work to reproduce it here. But in this meanwhile I can think in three alternatives that could help you to overcome it:

1 - Pass your credentials directly to awswrangler.Session like:

import awswrangler as wr

wr_session = wr.Session(
    aws_access_key_id="...",
    aws_secret_access_key="...",
    region_name="us-east-1"
)

2 - Avoid the session serialization/deserialization through multiple processing passing procs_cpu_bound=1 in the to_csv method like:

wr_session.pandas.to_csv(df, "s3://foo/bar", procs_cpu_bound=1)

3 - 1 and 2 combined.

I will work on that right now, so please, let me know any updates from your side.