datasets: Fatal error condition occurred in aws-c-io
Describe the bug
Fatal error when using the library
Steps to reproduce the bug
from datasets import load_dataset
dataset = load_dataset('wikiann', 'en')
Expected results
No fatal errors
Actual results
Fatal error condition occurred in D:\bld\aws-c-io_1633633258269\work\source\event_loop.c:74: aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn, el_group, &thread_options) == AWS_OP_SUCCESS
Exiting Application
Environment info
datasets
version: 1.15.2.dev0- Platform: Windows-10-10.0.22504-SP0
- Python version: 3.8.12
- PyArrow version: 6.0.0
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 28 (4 by maintainers)
Commits related to this issue
- Downgrade pyarrow. see https://github.com/huggingface/datasets/issues/3310#issuecomment-1247390774 — committed to thesofakillers/infoshare by thesofakillers a year ago
Downgrading pyarrow to 6.0.1 solves the issue for me.
pip install pyarrow==6.0.1
Any updates for your issue because I’m getting the same one
I also get this issue, It appears after my script has finished running. I get the following error message
I don’t get this issue when running my code in a container, and it seems more relevant to PyArrow but thought a more complete stack trace might be helpful to someone
pyarrow 10.0.1 was just released in conda-forge, which is the first release where we’re building against aws-sdk-cpp 1.9.* again after more than a year. Since we cannot test the failure reported here on our infra, I’d be very grateful if someone could verify that the problem does or doesn’t reappear. 🙃
I also encountered the same problem, but I made an error in the multi gpu training environment on Linux, and the single gpu training environment will not make an error. i use accelerate package to do multi gpu training.
For pip people, I confirmed that installing the nightly version of pyarrow also solves this by:
pip install --extra-index-url https://pypi.fury.io/arrow-nightlies/ --prefer-binary --pre pyarrow --upgrade
. (See https://arrow.apache.org/docs/python/install.html#installing-nightly-packages) Any version after https://github.com/apache/arrow/pull/14157 would work fine.There is also a discussion here https://issues.apache.org/jira/browse/ARROW-15141 where it is suggested for conda users to use an older version of aws-sdk-cpp:
aws-sdk-cpp=1.8.186