aws-sdk-pandas: Error when sending DataFrame to S3 using a boto3 Session
Hi.
I am trying to send a DataFrame to S3 by using a previously created boto3 session and get the following error:
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 908, in _data_to_s3_dataset_writer_remote
isolated_dataframe=True))
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 853, in _data_to_s3_dataset_writer
isolated_dataframe=isolated_dataframe)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 962, in _data_to_s3_object_writer
extra_args=extra_args)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 999, in _write_csv_dataframe
Pandas._write_csv_to_s3_retrying(fs=fs, path=path, buffer=csv_buffer)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 241, in wrapped_f
return self.call(f, *args, **kw)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 330, in call
start_time=start_time)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 279, in iter
return fut.result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 333, in call
result = fn(*args, **kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 1009, in _write_csv_to_s3_retrying
f.write(buffer)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1245, in __exit__
self.close()
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1213, in close
self.flush(force=True)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1085, in flush
self._initiate_upload()
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 1002, in _initiate_upload
Bucket=self.bucket, Key=self.key, ACL=self.acl)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 991, in _call_s3
**kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 184, in _call_s3
return method(**additional_kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 276, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 573, in _make_api_call
operation_model, request_dict, request_context)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 592, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 132, in _send_request
request = self.create_request(request_dict, operation_model)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 116, in create_request
operation_name=operation_model.name)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/signers.py", line 90, in handler
return self.sign(operation_name, request)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/signers.py", line 160, in sign
auth.add_auth(request)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/auth.py", line 357, in add_auth
raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 908, in _data_to_s3_dataset_writer_remote
isolated_dataframe=True))
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 853, in _data_to_s3_dataset_writer
isolated_dataframe=isolated_dataframe)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 962, in _data_to_s3_object_writer
extra_args=extra_args)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 999, in _write_csv_dataframe
Pandas._write_csv_to_s3_retrying(fs=fs, path=path, buffer=csv_buffer)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 241, in wrapped_f
return self.call(f, *args, **kw)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 330, in call
start_time=start_time)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 279, in iter
return fut.result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 333, in call
result = fn(*args, **kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 1009, in _write_csv_to_s3_retrying
f.write(buffer)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1245, in __exit__
self.close()
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1213, in close
self.flush(force=True)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1085, in flush
self._initiate_upload()
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 1002, in _initiate_upload
Bucket=self.bucket, Key=self.key, ACL=self.acl)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 991, in _call_s3
**kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 184, in _call_s3
return method(**additional_kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 276, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 573, in _make_api_call
operation_model, request_dict, request_context)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 592, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 132, in _send_request
request = self.create_request(request_dict, operation_model)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 116, in create_request
operation_name=operation_model.name)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/signers.py", line 90, in handler
return self.sign(operation_name, request)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/signers.py", line 160, in sign
auth.add_auth(request)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/auth.py", line 357, in add_auth
raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Traceback (most recent call last):
File "awswrangler_test.py", line 14, in <module>
wr_session.pandas.to_csv(df, "s3://blu-datalake/test/foobar")
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 592, in to_csv
columns_comments=columns_comments)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 729, in to_s3
extra_args=extra_args)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 805, in data_to_s3
objects_paths += receive_pipes[i].recv()
File "/usr/lib/python3.6/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
I believe it i trying to get the credentials from the default profile in ~/.aws/credentials
.
Here is an example that replicate the error:
import pandas as pd
import awswrangler as wr
import boto3
boto3_session = boto3.Session(aws_access_key_id="****", aws_secret_access_key="****",)
wr_session = wr.Session(boto3_session=boto3_session)
df = pd.DataFrame(dict(a=[1, 2], b=[3, 4]))
wr_session.pandas.to_csv(df, "s3://foo/bar")
Is this the right way to use a boto3 Session?
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16 (9 by maintainers)
@gabraganca I could reproduce the first two stack traces and then we decided to keep the focus on the version
1.0.0
. We will have specific test cases to cover all this kind of situation related to boto3 session.Thanks @gabraganca, I will keep trying to figure out.
@igorborgest Oh, I see. I didn’t know that was possible. That makes everything easier.
I’ll try that and return back to you.
I just tried the following code and got the same error:
Great, @igorborgest . Actually, I found this bug when running
to_parquet
, but theto_csv
method returns the same error.Hi @gabraganca, now it really seems like a bug. 🐛
I will start work to reproduce it here. But in this meanwhile I can think in three alternatives that could help you to overcome it:
1 - Pass your credentials directly to
awswrangler.Session
like:2 - Avoid the session serialization/deserialization through multiple processing passing
procs_cpu_bound=1
in theto_csv
method like:3 - 1 and 2 combined.
I will work on that right now, so please, let me know any updates from your side.