scikit-learn: t-SNE results in errors when reducing dim to default 2

Hi everyone, I was wondering whether anyone knows why t-SNE results into memory errors when reducing the dimension of the data to the default number, which is 2. I have the mnist dataset and I apply some transformation on it. Resulting in a reduced mnist dataset. Original: 60000x784 After transformation: 60000x200

When I apply t-SNE on transformed mnist 60000x200, I always get a memory error. Also I was wondering why doesn’t the multicore option of n_jobs=-1 exist?

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 32 (16 by maintainers)

Most upvoted comments

@glemaitre I found the barnes-hut implementation on Github and I’ve been able to run it on my data using Python 2.7 (but not Python 3, there’s a bug). It was complex to get my docker image to have both Anaconda 2 and 3, but it finally worked. So, it would be good if sklearn had the barnes hut implementation. The existing one is so limiting it is almost a trick that it is in the library at all. I wish it hadn’t been there to waste my time. It should be replaced with BH-TSNE.

As to my data, the file was not large. It was a few megabytes of data, so it is surprising that sklearn’s TSNE could eat nearly 300GB of RAM. I really think this should be fixed, to use BH-TSNE.

The bh-tsne project is https://github.com/lvdmaaten/bhtsne

On Wed, Mar 22, 2017 at 3:16 PM, Joel Nothman notifications@github.com wrote:

apart from pairwise distances, it would be worth scouring the code for other suboptimal complexity issues, and using a benchmark script to plot memory wrt samples. (In fact it would be great to have this for all estimators alongside expected space and time complexity)

On 23 Mar 2017 9:12 am, “Joel Nothman” joel.nothman@gmail.com wrote:

sorry, you’re right. we could have a theoretical O(N) even if we have to look up each in an O(logN) data structure to find nearest neighbours. The implementation is definitely not kind in memory. PR using kneighbors_graph is welcome. I also suggest that we offer support for passing such a sparse graph when in precomputed mode as we do for dbscan

On 23 Mar 2017 7:56 am, “Arthur Goldberg” notifications@github.com wrote:

Instead of finding exact nearest neighbours, the Barnes-Hut uses a sparse approximate, it basically says for a point p, we find the similarities to the N nearest neighbours, and for all neighbours further away we call the similarity 0.

So with a sparse representation for n points, we need to compute N * n similarities (N is constant so overall O(n) ).

On 22 Mar 2017, at 20:46, Joel Nothman notifications@github.com wrote:

although I’m not very familiar with this code, i think it’s quite clear that our implementation still has O(N^2) memory requirements, though it may not be hard to reduce that to O(N log N). Without not looking at the paper, I don’t understand how one can perform exact nearest neighbours calculations without that memory cost.

On 23 Mar 2017 6:07 am, “Arthur Goldberg” notifications@github.com wrote:

@kirk86 https://github.com/kirk86 (Not related to this issue but to solve your personal problem I reckon you could implement t-SNE in tensorflow pretty easily which would allow you to use multiple cores… Might be a fun project)

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub <https://github.com/scikit-learn/scikit-learn/issues/8582# issuecomment-288506918>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6xodQ UAg3ouJx0kwMprTew7if-Gfks5roXF6gaJpZM4Mbi9z> . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/scikit-learn/scikit-learn/issues/ 8582#issuecomment-288537015>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ AAEz65VzFXhWPApmeD4kX0jj8AsBgFHQks5roYsOgaJpZM4Mbi9z> .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/scikit-learn/issues/8582#issuecomment-288556867, or mute the thread https://github.com/notifications/unsubscribe-auth/AACkpSeDc9uLEswsDQ4uCWd9Jmgl4EuPks5roZ3YgaJpZM4Mbi9z .

@lesteve thanks for the reply here’s the output of a simple example

In [47]: X_train.shape
Out[47]: (60000, 200)

n [50]: X_train_reduced = TSNE(n_components=2, random_state=0).fit_transform(X_train)
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-50-6848c4dfd81b> in <module>()
----> 1 X_train_reduced = TSNE(n_components=2, random_state=0).fit_transform(X_train)

/home/user/miniconda2/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in fit_transform(self, X, y)
    882             Embedding of the training data in low-dimensional space.
    883         """
--> 884         embedding = self._fit(X)
    885         self.embedding_ = embedding
    886         return self.embedding_

/home/user/miniconda2/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _fit(self, X, skip_num_points)
    764                 neighbors_nn = neighbors_nn[:, 1:]
    765             P = _joint_probabilities_nn(distances, neighbors_nn,
--> 766                                         self.perplexity, self.verbose)
    767         else:
    768             P = _joint_probabilities(distances, self.perplexity, self.verbose)

/home/user/miniconda2/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _joint_probabilities_nn(distances, neighbors, desired_perplexity, verbose)
     96     m = "All probabilities should be finite"
     97     assert np.all(np.isfinite(conditional_P)), m
---> 98     P = conditional_P + conditional_P.T
     99     sum_P = np.maximum(np.sum(P), MACHINE_EPSILON)
    100     P = np.maximum(squareform(P) / sum_P, MACHINE_EPSILON)

MemoryError:

The above example was run on a PC with 24 cores and 64GB of ram and still get memory errors.

although I’m not very familiar with this code, i think it’s quite clear that our implementation still has O(N^2) memory requirements, though it may not be hard to reduce that to O(N log N). Without not looking at the paper, I don’t understand how one can perform exact nearest neighbours calculations without that memory cost.

On 23 Mar 2017 6:07 am, “Arthur Goldberg” notifications@github.com wrote:

@kirk86 https://github.com/kirk86 (Not related to this issue but to solve your personal problem I reckon you could implement t-SNE in tensorflow pretty easily which would allow you to use multiple cores… Might be a fun project)

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/scikit-learn/scikit-learn/issues/8582#issuecomment-288506918, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz6xodQUAg3ouJx0kwMprTew7if-Gfks5roXF6gaJpZM4Mbi9z .

@kirk86 (Not related to this issue but to solve your personal problem I reckon you could implement t-SNE in tensorflow pretty easily which would allow you to use multiple cores/GPUs/whatever … Might be a fun project)

@jnothman barnes-hut t-SNE should run in O(NlogN) time and O(N) memory – (see Section 1. of paper https://arxiv.org/abs/1301.3342)

I am a bit surprised to see that barnes_hut computes all-pairs distances, when it could be using kneighbors_graph to sparsely calculate distances where a binary tree is appropriate.