fastparquet: fastparquet v 0.6.1 crashes on read
What happened:
Loading a 32Mb parquet file crashes with a core dump. It takes up 3Gb of memory before crashing.
What you expected to happen:
Prior to this version (unsure if 0.5.0 or 0.6.0) loading the same file worked with no problem. Tested on 0.5.0 without issues. Will test on 0.6.0 also.
Minimal Complete Verifiable Example:
Sorry I can’t provide the file as it is private data.
pd.read_parquet(filename)
Anything else we need to know?:
This issue appeared today (2021-05-12) so we assume it is related to release 0.6.1. Switching to pyarrow fixed the problem immediately. Previously, pyarrow was not installed and fastparquet was always used. Our Jupyter environments are ephemeral and torn down every day, so we reinstall new versions daily for work. Thus we assume 0.6.1 introduced the bug.
Environment:
- Dask version: ? 0.6.1 fastparquet
- Python version: 3.8.5
- Operating System: Ubuntu 20.04.1 LTS (Focal Fossa) on AWS
- Install method (conda, pip, source): pip
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (8 by maintainers)
Note that you helped me to identify a speedup of ~15% for UTF8 string reading, so I’m almost glad this bug was there.
OK thank you - I should be able to work with that
I am already working on this. It turns out that Windows has a different idea of what
longmeans, even on 64-bit.