BERTopic: TypeError: 'numpy.float64' object cannot be interpreted as an integer

Hey! I had the problem mentioned in this thread, but after the update the problem was solved. Another one appeared, I get this error:

TypeError                                 Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/bertopic/_bertopic.py](https://localhost:8080/#) in _cluster_embeddings(self, umap_embeddings, documents, partial_fit, y)
   3217             try:
-> 3218                 self.hdbscan_model.fit(umap_embeddings, y=y)
   3219             except TypeError:

9 frames
hdbscan/_hdbscan_tree.pyx in hdbscan._hdbscan_tree.condense_tree()

hdbscan/_hdbscan_tree.pyx in hdbscan._hdbscan_tree.condense_tree()

TypeError: 'numpy.float64' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py](https://localhost:8080/#) in _tree_to_labels(X, single_linkage_tree, min_cluster_size, cluster_selection_method, allow_single_cluster, match_reference_implementation, cluster_selection_epsilon, max_cluster_size)
     76     set of labels and probabilities.
     77     """
---> 78     condensed_tree = condense_tree(single_linkage_tree, min_cluster_size)
     79     stability_dict = compute_stability(condensed_tree)
     80     labels, probabilities, stabilities = get_clusters(

hdbscan/_hdbscan_tree.pyx in hdbscan._hdbscan_tree.condense_tree()

hdbscan/_hdbscan_tree.pyx in hdbscan._hdbscan_tree.condense_tree()

TypeError: 'numpy.float64' object cannot be interpreted as an integer

I checked the data that I put into the model and there is nothing there in this format. What’s more, I checked it on the file I was working on yesterday, which was reprocessed successfully. Suddenly I am getting this error on it as well. Any ideas?

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 10
  • Comments: 19 (4 by maintainers)

Most upvoted comments

@ssaee79 The following is working for me in a fresh Google Colab:

First, you install BERTopic as follows:

!pip install --upgrade git+https://github.com/scikit-learn-contrib/hdbscan.git
!pip install --upgrade BERTopic

Then, you restart the runtime to make sure that imports are refreshed.

Finally, the following code is working for me:

from sklearn.datasets import fetch_20newsgroups
from sentence_transformers import SentenceTransformer
from bertopic import BERTopic

# Prepare embeddings
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']
sentence_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = sentence_model.encode(docs, show_progress_bar=True)

# Train our topic model using our pre-trained sentence-transformers embeddings
topic_model = BERTopic()
topics, probs = topic_model.fit_transform(docs, embeddings)

This is a problem with hdbscan, not BERTopic, and can be worked around with this method: https://github.com/scikit-learn-contrib/hdbscan/issues/600#issuecomment-1638837464

!pip install git+https://github.com/scikit-learn-contrib/hdbscan.git
!pip install BERTopic

@bray2016 Check if you are getting an error Building wheels for collected packages: hdbscan while installing BERTopic. If so try following my previous reply of downloading C++

It’s working now, but I’m not entirely sure what fixed it. I already had MV C++ installed. I had been creating new environments, restarting my computer, etc. I assume one of the fixes here worked, so thank you! Sorry I can’t point to one in particular.

@ssaee79 The following is working for me in a fresh Google Colab:

First, you install BERTopic as follows:

!pip install --upgrade git+https://github.com/scikit-learn-contrib/hdbscan.git
!pip install --upgrade BERTopic

Then, you restart the runtime to make sure that imports are refreshed.

Finally, the following code is working for me:

from sklearn.datasets import fetch_20newsgroups
from sentence_transformers import SentenceTransformer
from bertopic import BERTopic

# Prepare embeddings
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']
sentence_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = sentence_model.encode(docs, show_progress_bar=True)

# Train our topic model using our pre-trained sentence-transformers embeddings
topic_model = BERTopic()
topics, probs = topic_model.fit_transform(docs, embeddings)

It works for me 😃 Thank you so much!

are you solve it? even I use !pip install git+https://github.com/scikit-learn-contrib/hdbscan.git !pip install BERTopic still not working

Hi @MaartenGr, the issue got resolved. The issue was mainly due to error Building wheels for collected packages: hdbscan. Found the solution from your other replies. Thank you!

hi, how did you resolve it? i’ve been having the same issue and now created multiple environments but no luck

@Rishi-Prakash-TS Did you try from a completely new environment? It often helps to start fresh and then do the installation of packages.

There is a new release of hdbscan on PyPI that will hopefully fix this now.

This is a problem with hdbscan, not BERTopic, and can be worked around with this method: scikit-learn-contrib/hdbscan#600 (comment)

!pip install git+https://github.com/scikit-learn-contrib/hdbscan.git
!pip install BERTopic

Thanks too!