BERTopic: Inconsistent results on different machines
Hey!
Recently I found out that my code is giving me different results when running on local (M1 macbook), local Docker or k8s docker containers. I use random_seed
for UMAP
indeed however also 20newsgroups
is behaving differently and returning not exact same results. From my code perspective difference is quite big - on localhost script generated ~150topics and on cloud just around 40 (even initial set of topics was similar but not exact).
I double-checked whether same data goes in and tried to set numpy
different random seeds but nothing really happened.
I tested this with python:3.7
and python:3.9
as well as bertopic
version 0.10.0
or 0.9.4
.
Any idea or experience how to make results same across different platforms?
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 20 (9 by maintainers)
I have run into the same issue, and on the same machine (iMac running Mojave) running on the same dataset at different times: Python version: 3.9.12 (v3.9.12:b28265d7e6, Mar 23 2022, 18:17:11) numpy version: 1.23.5 scikit-learn version: 1.2.0 numba version: 0.56.4 umap version: 0.5.3
this is even using random_state=np.random.RandomState(42) as has been suggested elsewhere. Attached is an example output using the same input data run two different time. This is a shame because in my limited testing UMAP outperforms tSNE but if we can’t get the same results from session to session it limits the usefulness. Is there a solution? Peter. UMAP.pdf