cuml: [BUG] t-SNE is not deterministic even with random_state

Describe the bug cuML’s t-SNE outputs vary from run to run, even when random_state is used or initial embeddings are provided (and #2549 is fixed).

Steps/Code to reproduce bug

from cuml.manifold import TSNE
from sklearn.datasets.samples_generator import make_blobs
from matplotlib import pyplot as plt
import numpy as np

X, y = make_blobs(n_samples=10000, centers=8, n_features=4, random_state=0)
X = X.astype(np.float32)
tsne = TSNE(random_state=0)

Y1 = tsne.fit_transform(X)
plt.scatter(Y1[:, 0], Y1[:, 1])
plt.show()

Y2 = tsne.fit_transform(X)
plt.scatter(Y2[:, 0], Y2[:, 1])
plt.show()

Expected behavior Should be the same between runs, like scikit learn.

Environment details (please complete the following information):

Environment location: GCE
Linux Distro/Architecture: Ubuntu 20.04 amd64
GPU Model/Driver: V100 and driver 440.33.01
CUDA: 10.2
Method of cuDF & cuML install: source, cmake 3.16.3, gcc 8.4.0, fbe6272

About this issue

Original URL
State: open
Created 4 years ago
Comments: 18 (17 by maintainers)

Most upvoted comments

@trivialfis, the new (experimental) FFT implementation has lower variance and better numerical stability but it’s not completely deterministic. It would be great if you were able to make this deterministic as well.

cjnolet on May 13, 2021

Thanks @danielhanchen. Started landing those changes with https://github.com/rapidsai/cuml/pull/3018.

zbjornson on Oct 20, 2020