aws-sdk-pandas: Error when sending DataFrame to S3 using a boto3 Session
Hi.
I am trying to send a DataFrame to S3 by using a previously created boto3 session and get the following error:
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 908, in _data_to_s3_dataset_writer_remote
isolated_dataframe=True))
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 853, in _data_to_s3_dataset_writer
isolated_dataframe=isolated_dataframe)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 962, in _data_to_s3_object_writer
extra_args=extra_args)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 999, in _write_csv_dataframe
Pandas._write_csv_to_s3_retrying(fs=fs, path=path, buffer=csv_buffer)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 241, in wrapped_f
return self.call(f, *args, **kw)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 330, in call
start_time=start_time)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 279, in iter
return fut.result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 333, in call
result = fn(*args, **kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 1009, in _write_csv_to_s3_retrying
f.write(buffer)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1245, in __exit__
self.close()
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1213, in close
self.flush(force=True)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1085, in flush
self._initiate_upload()
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 1002, in _initiate_upload
Bucket=self.bucket, Key=self.key, ACL=self.acl)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 991, in _call_s3
**kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 184, in _call_s3
return method(**additional_kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 276, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 573, in _make_api_call
operation_model, request_dict, request_context)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 592, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 132, in _send_request
request = self.create_request(request_dict, operation_model)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 116, in create_request
operation_name=operation_model.name)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/signers.py", line 90, in handler
return self.sign(operation_name, request)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/signers.py", line 160, in sign
auth.add_auth(request)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/auth.py", line 357, in add_auth
raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 908, in _data_to_s3_dataset_writer_remote
isolated_dataframe=True))
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 853, in _data_to_s3_dataset_writer
isolated_dataframe=isolated_dataframe)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 962, in _data_to_s3_object_writer
extra_args=extra_args)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 999, in _write_csv_dataframe
Pandas._write_csv_to_s3_retrying(fs=fs, path=path, buffer=csv_buffer)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 241, in wrapped_f
return self.call(f, *args, **kw)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 330, in call
start_time=start_time)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 279, in iter
return fut.result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/tenacity/__init__.py", line 333, in call
result = fn(*args, **kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 1009, in _write_csv_to_s3_retrying
f.write(buffer)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1245, in __exit__
self.close()
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1213, in close
self.flush(force=True)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/fsspec/spec.py", line 1085, in flush
self._initiate_upload()
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 1002, in _initiate_upload
Bucket=self.bucket, Key=self.key, ACL=self.acl)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 991, in _call_s3
**kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/s3fs/core.py", line 184, in _call_s3
return method(**additional_kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 276, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 573, in _make_api_call
operation_model, request_dict, request_context)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/client.py", line 592, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 132, in _send_request
request = self.create_request(request_dict, operation_model)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/endpoint.py", line 116, in create_request
operation_name=operation_model.name)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/signers.py", line 90, in handler
return self.sign(operation_name, request)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/signers.py", line 160, in sign
auth.add_auth(request)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/botocore/auth.py", line 357, in add_auth
raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Traceback (most recent call last):
File "awswrangler_test.py", line 14, in <module>
wr_session.pandas.to_csv(df, "s3://blu-datalake/test/foobar")
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 592, in to_csv
columns_comments=columns_comments)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 729, in to_s3
extra_args=extra_args)
File "/home/gbra/.virtualenvs/blu_airflow/lib/python3.6/site-packages/awswrangler/pandas.py", line 805, in data_to_s3
objects_paths += receive_pipes[i].recv()
File "/usr/lib/python3.6/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
I believe it i trying to get the credentials from the default profile in ~/.aws/credentials.
Here is an example that replicate the error:
import pandas as pd
import awswrangler as wr
import boto3
boto3_session = boto3.Session(aws_access_key_id="****", aws_secret_access_key="****",)
wr_session = wr.Session(boto3_session=boto3_session)
df = pd.DataFrame(dict(a=[1, 2], b=[3, 4]))
wr_session.pandas.to_csv(df, "s3://foo/bar")
Is this the right way to use a boto3 Session?
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16 (9 by maintainers)
@gabraganca I could reproduce the first two stack traces and then we decided to keep the focus on the version
1.0.0. We will have specific test cases to cover all this kind of situation related to boto3 session.Thanks @gabraganca, I will keep trying to figure out.
@igorborgest Oh, I see. I didn’t know that was possible. That makes everything easier.
I’ll try that and return back to you.
I just tried the following code and got the same error:
Great, @igorborgest . Actually, I found this bug when running
to_parquet, but theto_csvmethod returns the same error.Hi @gabraganca, now it really seems like a bug. 🐛
I will start work to reproduce it here. But in this meanwhile I can think in three alternatives that could help you to overcome it:
1 - Pass your credentials directly to
awswrangler.Sessionlike:2 - Avoid the session serialization/deserialization through multiple processing passing
procs_cpu_bound=1in theto_csvmethod like:3 - 1 and 2 combined.
I will work on that right now, so please, let me know any updates from your side.