cuml: [BUG] t-SNE is not deterministic even with random_state
Describe the bug
cuML’s t-SNE outputs vary from run to run, even when random_state is used or initial embeddings are provided (and #2549 is fixed).
Steps/Code to reproduce bug
from cuml.manifold import TSNE
from sklearn.datasets.samples_generator import make_blobs
from matplotlib import pyplot as plt
import numpy as np
X, y = make_blobs(n_samples=10000, centers=8, n_features=4, random_state=0)
X = X.astype(np.float32)
tsne = TSNE(random_state=0)
Y1 = tsne.fit_transform(X)
plt.scatter(Y1[:, 0], Y1[:, 1])
plt.show()
Y2 = tsne.fit_transform(X)
plt.scatter(Y2[:, 0], Y2[:, 1])
plt.show()

Expected behavior Should be the same between runs, like scikit learn.
Environment details (please complete the following information):
- Environment location: GCE
- Linux Distro/Architecture: Ubuntu 20.04 amd64
- GPU Model/Driver: V100 and driver 440.33.01
- CUDA: 10.2
- Method of cuDF & cuML install: source, cmake 3.16.3, gcc 8.4.0, fbe6272
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 18 (17 by maintainers)
@trivialfis, the new (experimental) FFT implementation has lower variance and better numerical stability but it’s not completely deterministic. It would be great if you were able to make this deterministic as well.
Thanks @danielhanchen. Started landing those changes with https://github.com/rapidsai/cuml/pull/3018.