BERTopic: TypeError: 'numpy.float64' object cannot be interpreted as an integer

Hey! I had the problem mentioned in this thread, but after the update the problem was solved. Another one appeared, I get this error:

TypeError                                 Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/bertopic/_bertopic.py](https://localhost:8080/#) in _cluster_embeddings(self, umap_embeddings, documents, partial_fit, y)
   3217             try:
-> 3218                 self.hdbscan_model.fit(umap_embeddings, y=y)
   3219             except TypeError:

9 frames
hdbscan/_hdbscan_tree.pyx in hdbscan._hdbscan_tree.condense_tree()

hdbscan/_hdbscan_tree.pyx in hdbscan._hdbscan_tree.condense_tree()

TypeError: 'numpy.float64' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py](https://localhost:8080/#) in _tree_to_labels(X, single_linkage_tree, min_cluster_size, cluster_selection_method, allow_single_cluster, match_reference_implementation, cluster_selection_epsilon, max_cluster_size)
     76     set of labels and probabilities.
     77     """
---> 78     condensed_tree = condense_tree(single_linkage_tree, min_cluster_size)
     79     stability_dict = compute_stability(condensed_tree)
     80     labels, probabilities, stabilities = get_clusters(

hdbscan/_hdbscan_tree.pyx in hdbscan._hdbscan_tree.condense_tree()

hdbscan/_hdbscan_tree.pyx in hdbscan._hdbscan_tree.condense_tree()

TypeError: 'numpy.float64' object cannot be interpreted as an integer

I checked the data that I put into the model and there is nothing there in this format. What’s more, I checked it on the file I was working on yesterday, which was reprocessed successfully. Suddenly I am getting this error on it as well. Any ideas?

About this issue

Original URL
State: open
Created a year ago
Reactions: 10
Comments: 19 (4 by maintainers)

Most upvoted comments

@ssaee79 The following is working for me in a fresh Google Colab:

First, you install BERTopic as follows:

!pip install --upgrade git+https://github.com/scikit-learn-contrib/hdbscan.git
!pip install --upgrade BERTopic

Then, you restart the runtime to make sure that imports are refreshed.

Finally, the following code is working for me:

from sklearn.datasets import fetch_20newsgroups
from sentence_transformers import SentenceTransformer
from bertopic import BERTopic

# Prepare embeddings
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']
sentence_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = sentence_model.encode(docs, show_progress_bar=True)

# Train our topic model using our pre-trained sentence-transformers embeddings
topic_model = BERTopic()
topics, probs = topic_model.fit_transform(docs, embeddings)

MaartenGr on Jul 18, 2023

This is a problem with hdbscan, not BERTopic, and can be worked around with this method: https://github.com/scikit-learn-contrib/hdbscan/issues/600#issuecomment-1638837464

!pip install git+https://github.com/scikit-learn-contrib/hdbscan.git
!pip install BERTopic

jsalsman on Jul 17, 2023

@bray2016 Check if you are getting an error Building wheels for collected packages: hdbscan while installing BERTopic. If so try following my previous reply of downloading C++

It’s working now, but I’m not entirely sure what fixed it. I already had MV C++ installed. I had been creating new environments, restarting my computer, etc. I assume one of the fixes here worked, so thank you! Sorry I can’t point to one in particular.

bray2016 on Aug 21, 2023

@ssaee79 The following is working for me in a fresh Google Colab:

First, you install BERTopic as follows:

!pip install --upgrade git+https://github.com/scikit-learn-contrib/hdbscan.git
!pip install --upgrade BERTopic

Then, you restart the runtime to make sure that imports are refreshed.

Finally, the following code is working for me:

from sklearn.datasets import fetch_20newsgroups
from sentence_transformers import SentenceTransformer
from bertopic import BERTopic

# Prepare embeddings
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']
sentence_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = sentence_model.encode(docs, show_progress_bar=True)

# Train our topic model using our pre-trained sentence-transformers embeddings
topic_model = BERTopic()
topics, probs = topic_model.fit_transform(docs, embeddings)

It works for me 😃 Thank you so much!

ssaee79 on Jul 18, 2023

are you solve it? even I use !pip install git+https://github.com/scikit-learn-contrib/hdbscan.git !pip install BERTopic still not working

shasha920 on Jul 18, 2023

Hi @MaartenGr, the issue got resolved. The issue was mainly due to error Building wheels for collected packages: hdbscan. Found the solution from your other replies. Thank you!

hi, how did you resolve it? i’ve been having the same issue and now created multiple environments but no luck

firoznamaji on Aug 19, 2023

@Rishi-Prakash-TS Did you try from a completely new environment? It often helps to start fresh and then do the installation of packages.

MaartenGr on Aug 7, 2023

There is a new release of hdbscan on PyPI that will hopefully fix this now.

lmcinnes on Jul 19, 2023

This is a problem with hdbscan, not BERTopic, and can be worked around with this method: scikit-learn-contrib/hdbscan#600 (comment)
!pip install git+https://github.com/scikit-learn-contrib/hdbscan.git
!pip install BERTopic

Thanks too!

shasha920 on Jul 18, 2023