pandas: import pandas error for missing compression libraries
Code Sample
[dev]rbuhr:~% python
Python 3.7.2 (default, Jul 24 2019, 19:27:42)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/__init__.py", line 55, in <module>
from pandas.core.api import (
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/api.py", line 24, in <module>
from pandas.core.groupby import Grouper, NamedAgg
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/groupby/__init__.py", line 1, in <module>
from pandas.core.groupby.generic import ( # noqa: F401
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 44, in <module>
from pandas.core.frame import DataFrame
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/frame.py", line 88, in <module>
from pandas.core.generic import NDFrame, _shared_docs
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/generic.py", line 71, in <module>
from pandas.io.formats.format import DataFrameFormatter, format_percentiles
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/io/formats/format.py", line 47, in <module>
from pandas.io.common import _expand_user, _stringify_path
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/io/common.py", line 9, in <module>
import lzma
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/lzma.py", line 27, in <module>
from _lzma import *
ModuleNotFoundError: No module named '_lzma'
>>>
Problem description
After installing pandas 0.25.0, I can’t import the library because of missing compression libraries. First it returned the error message ModuleNotFoundError: No module named '_bz2'
. I installed with sudo apt-get install libbz2-dev
and tried again to get the error message from the code sample above, ModuleNotFoundError: No module named '_lzma'
.
This was not an issue with the previous version of pandas and I tested by downgrading to pandas 0.24.0 and was able to import without the error messages. I feel like pandas should not prevent usage just because some optional compression programs are not installed, like the default behavior of the last version.
Expected Output
>>> import pandas
>>>
Output of pd.show_versions()
Unable to run because can’t import pandas.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 26 (18 by maintainers)
Commits related to this issue
- Release not explaining solution to #27575. — committed to Salompas/pandas by Salompas 5 years ago
- Improved explanation of solution to #27575. — committed to Salompas/pandas by Salompas 5 years ago
I feel like the closing of this issue was not appropriate. The other two issues linked also have the same problem – that pandas 0.25 assumes you have things installed that may not actually have come with python by default. This should be made explicitly clear up front before installation completes, not as an import error after installation.
I second @raybuhr 's comment. Pyenv is a project with 16k stars. It’s very widely used.
I feel this is an incorrect assumption then. I’ve been using Pyenv successfully and never run into an issue with
_lzma
until this release. It’s not a nice experience that people should read through Stack Overflow and 3 closed (!) Pandas issue threads to figure out how tobrew install xz
as a solution.I already
brew install xz
but theModuleNotFoundError: No module named '_bz2'
still shows upOS: BigSur python 3.8.6 pandas 1.1.5 python version manager pyenv
I suspect we would accept a PR that did the lzma import in a try / except ImportError block.
When the module is not present, we would emit a
UserWarning
that their Python was not compiled properly and thatlzma
compression is not available. And if they uselzma
compression we would raise at runtime.Is anyone interested in submitting a PR?
I see the points made above about this probably being an issue with system level dependencies. I am in fact using pyenv to install and fixing for our team isn’t particularly difficult.
Since python expects the compression libraries to be installed since the modules are part of the standard library, this probably doesn’t have to be an issue for the pandas team. That said, I still feel like making the compression libraries prerequisites for using pandas as unnecessary overhead. I think a more sympathetic response would be to try importing the compression modules and return a message that they aren’t installed while still allowing pandas to be imported and used, just without support for compression.
I was getting this warning:
/Users/usr/.pyenv/versions/3.9.5/lib/python3.9/site-packages/pandas/compat/__init__.py:97: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
I was finally able to get rid of it with this command:
CPPFLAGS="-I$(brew --prefix xz)/include" pyenv install [your version]
Just thought I’d drop this here for anyone with my same problem.
OS: Big Sur M1 chip pyenv python 3.9.5
ModuleNotFoundError: No module named ‘_lzma’: Oh,shit! This problem killed my whole day! 0.25.0 has this error however 0.24.2 is OK! I rollback 0.24.2 version. However problems is lacking like _lzma.cpython-36m-darwin.so file in lib_dynload directory. Maybe, I need to recompiled。
Pandas 0.25.0 is not useable with tools like kubeless as debian base images for Docker don’t appear to contain the proper libs for _lzma any more. You’d need to build out custom images.
Pandas 0.24.2 works fine.
https://github.com/kubeless/runtimes/issues/44
And just to be clear, this isn’t a pyenv issue. It’s a problem on the user’s machine not having the proper dependencies when Python is compiled.
Yea this is certainly unfortunate but quoting what I think is the most definitive response from the Python mailing list:
https://mail.python.org/pipermail/python-ideas/2018-October/054089.html
So since Python doesn’t document this library as optional it should be available and if not the responsibility of the distributor to handle that expectation