pandas: import pandas error for missing compression libraries

Code Sample

[dev]rbuhr:~% python
Python 3.7.2 (default, Jul 24 2019, 19:27:42)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/__init__.py", line 55, in <module>
    from pandas.core.api import (
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/api.py", line 24, in <module>
    from pandas.core.groupby import Grouper, NamedAgg
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/groupby/__init__.py", line 1, in <module>
    from pandas.core.groupby.generic import (  # noqa: F401
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 44, in <module>
    from pandas.core.frame import DataFrame
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/frame.py", line 88, in <module>
    from pandas.core.generic import NDFrame, _shared_docs
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/generic.py", line 71, in <module>
    from pandas.io.formats.format import DataFrameFormatter, format_percentiles
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/io/formats/format.py", line 47, in <module>
    from pandas.io.common import _expand_user, _stringify_path
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/io/common.py", line 9, in <module>
    import lzma
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/lzma.py", line 27, in <module>
    from _lzma import *
ModuleNotFoundError: No module named '_lzma'
>>>

Problem description

After installing pandas 0.25.0, I can’t import the library because of missing compression libraries. First it returned the error message ModuleNotFoundError: No module named '_bz2'. I installed with sudo apt-get install libbz2-dev and tried again to get the error message from the code sample above, ModuleNotFoundError: No module named '_lzma'.

This was not an issue with the previous version of pandas and I tested by downgrading to pandas 0.24.0 and was able to import without the error messages. I feel like pandas should not prevent usage just because some optional compression programs are not installed, like the default behavior of the last version.

Expected Output

>>> import pandas
>>>

Output of pd.show_versions()

Unable to run because can’t import pandas.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 26 (18 by maintainers)

Commits related to this issue

Most upvoted comments

I feel like the closing of this issue was not appropriate. The other two issues linked also have the same problem – that pandas 0.25 assumes you have things installed that may not actually have come with python by default. This should be made explicitly clear up front before installation completes, not as an import error after installation.

I second @raybuhr 's comment. Pyenv is a project with 16k stars. It’s very widely used.

I guess still implied that lzma is expected to be available as part of a standard Python distribution

I feel this is an incorrect assumption then. I’ve been using Pyenv successfully and never run into an issue with _lzma until this release. It’s not a nice experience that people should read through Stack Overflow and 3 closed (!) Pandas issue threads to figure out how to brew install xz as a solution.

I second @raybuhr 's comment. Pyenv is a project with 16k stars. It’s very widely used.

I guess still implied that lzma is expected to be available as part of a standard Python distribution

I feel this is an incorrect assumption then. I’ve been using Pyenv successfully and never run into an issue with _lzma until this release. It’s not a nice experience that people should read through Stack Overflow and 3 closed (!) Pandas issue threads to figure out how to brew install xz as a solution.

I already brew install xz but the ModuleNotFoundError: No module named '_bz2' still shows up

OS: BigSur python 3.8.6 pandas 1.1.5 python version manager pyenv

I suspect we would accept a PR that did the lzma import in a try / except ImportError block.

When the module is not present, we would emit a UserWarning that their Python was not compiled properly and that lzma compression is not available. And if they use lzma compression we would raise at runtime.

Is anyone interested in submitting a PR?

I see the points made above about this probably being an issue with system level dependencies. I am in fact using pyenv to install and fixing for our team isn’t particularly difficult.

Since python expects the compression libraries to be installed since the modules are part of the standard library, this probably doesn’t have to be an issue for the pandas team. That said, I still feel like making the compression libraries prerequisites for using pandas as unnecessary overhead. I think a more sympathetic response would be to try importing the compression modules and return a message that they aren’t installed while still allowing pandas to be imported and used, just without support for compression.

I was getting this warning: /Users/usr/.pyenv/versions/3.9.5/lib/python3.9/site-packages/pandas/compat/__init__.py:97: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.

I was finally able to get rid of it with this command: CPPFLAGS="-I$(brew --prefix xz)/include" pyenv install [your version]

Just thought I’d drop this here for anyone with my same problem.

OS: Big Sur M1 chip pyenv python 3.9.5

ModuleNotFoundError: No module named ‘_lzma’: Oh,shit! This problem killed my whole day! 0.25.0 has this error however 0.24.2 is OK! I rollback 0.24.2 version. However problems is lacking like _lzma.cpython-36m-darwin.so file in lib_dynload directory. Maybe, I need to recompiled。

Pandas 0.25.0 is not useable with tools like kubeless as debian base images for Docker don’t appear to contain the proper libs for _lzma any more. You’d need to build out custom images.

Pandas 0.24.2 works fine.

https://github.com/kubeless/runtimes/issues/44

And just to be clear, this isn’t a pyenv issue. It’s a problem on the user’s machine not having the proper dependencies when Python is compiled.

Yea this is certainly unfortunate but quoting what I think is the most definitive response from the Python mailing list:

I agree that modules that are necessarily optional should be documented as such, and as I mentioned on https://bugs.python.org/issue34895, many are so documented. In the absence of such documentation, I would considered it to be not optional except as some distributor decides to omit it. But then it is the responsibility of the distributor to document the omission.

https://mail.python.org/pipermail/python-ideas/2018-October/054089.html

So since Python doesn’t document this library as optional it should be available and if not the responsibility of the distributor to handle that expectation