BERTopic: Github actions: ValueError: numpy.ndarray size changed, may indicate binary incompatibility.

The github actions workflow is suddenly giving me the following error:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

It seems that it has most likely to do with numpy-based binary compatibility issues (some more info here). However, I cannot seem to fix it thus far with the suggested method (setting oldest-supported-numpy in pyproject.toml).

If you have any idea, please follow along with the full discussions here. Any help is greatly appreciated!

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 26 (11 by maintainers)

Most upvoted comments

The last few days I have been bug-fixing this as much as I could. However, it seems that the issue stems from ABI issues between HDBSCAN and Numpy. Whenever a major version is released from Numpy, there is a chance that it will break HDBSCAN if used together with UMAP.

Python 3.7

BERTopic works in python 3.7 seemingly without any problems, simply pip install bertopic should work.

Python 3.8+

For now, if you are on Python 3.8 or higher, it seems that the following will work:

pip install --upgrade pip setuptools wheel
pip install bertopic --no-cache-dir
pip uninstall hdbscan -y
pip install hdbscan --no-cache-dir --no-binary :all: --no-build-isolation

Future Fix

At this point, I am not entirely sure how I want to proceed. It seems that numpy>1.20.3 may introduce issues with large datasets on python 3.8+ as UMAP and HDBSCAN do not work properly together in that case. Thus, there does not seem to be a solid fix for now unless HDBSCAN gets updated to prevent this from happening in the future.

Having said that, any and all help is greatly appreciated!

Okay so I downgraded Python to 3.7 and now it works. I’m still not sure why it doesnt work with 3.8

Using conda to install bertopic worked for me.

Conda

To those interested, some of the issues users are having with the installation of BERTopic might be resolved by using conda to install BERTopic.

Installing bertopic from the conda-forge channel can be achieved by adding conda-forge to your channels with:

conda config --add channels conda-forge
conda config --set channel_priority strict

Once the conda-forge channel has been enabled, bertopic can be installed with:

conda install bertopic

@Ariannaperla Most likely, you updated to an unsupported numpy or numba version. I would advise starting from a fresh environment and trying the above again. If that does not work, using python 3.7 might solve your issue.

If all fails, you can also install BERTopic from conda, as instructed here.