scikit-learn: t-SNE fails with array must not contain infs or NaNs (OSX specific)
Darwin-15.0.0-x86_64-i386-64bit
('Python', '2.7.11 |Anaconda custom (x86_64)| (default, Dec 6 2015, 18:57:58) \n[GCC 4.2.1 (Apple Inc. build 5577)]')
('NumPy', '1.11.0')
('SciPy', '0.17.0')
('Scikit-Learn', '0.17.1')
When trying to run a t-SNE
proj = TSNE().fit_transform(X)
ValueError: array must not contain infs or NaNs
However
np.isfinite(X).all() # True
np.isnan(X).all() # False
np.isinf(X).all() # False
Full Stack Trace:
ValueError Traceback (most recent call last)
<ipython-input-16-c25f35fd042c> in <module>()
----> 1 plot(X, y)
<ipython-input-1-72bdb7124d13> in plot(X, y)
74
75 def plot(X, y):
---> 76 proj = TSNE().fit_transform(X)
77 scatter(proj, y)
/Users/joelkuiper/anaconda/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in fit_transform(self, X, y)
864 Embedding of the training data in low-dimensional space.
865 """
--> 866 embedding = self._fit(X)
867 self.embedding_ = embedding
868 return self.embedding_
/Users/joelkuiper/anaconda/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _fit(self, X, skip_num_points)
775 X_embedded=X_embedded,
776 neighbors=neighbors_nn,
--> 777 skip_num_points=skip_num_points)
778
779 def _tsne(self, P, degrees_of_freedom, n_samples, random_state,
/Users/joelkuiper/anaconda/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _tsne(self, P, degrees_of_freedom, n_samples, random_state, X_embedded, neighbors, skip_num_points)
830 opt_args['momentum'] = 0.8
831 opt_args['it'] = it + 1
--> 832 params, error, it = _gradient_descent(obj_func, params, **opt_args)
833 if self.verbose:
834 print("[t-SNE] Error after %d iterations with early "
/Users/joelkuiper/anaconda/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _gradient_descent(objective, p0, it, n_iter, objective_error, n_iter_check, n_iter_without_progress, momentum, learning_rate, min_gain, min_grad_norm, min_error_diff, verbose, args, kwargs)
385 for i in range(it, n_iter):
386 new_error, grad = objective(p, *args, **kwargs)
--> 387 grad_norm = linalg.norm(grad)
388
389 inc = update * grad >= 0.0
/Users/joelkuiper/anaconda/lib/python2.7/site-packages/scipy/linalg/misc.pyc in norm(a, ord, axis, keepdims)
127 """
128 # Differs from numpy only in non-finite handling and the use of blas.
--> 129 a = np.asarray_chkfinite(a)
130
131 # Only use optimized norms if axis and keepdims are not specified.
/Users/joelkuiper/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.pyc in asarray_chkfinite(a, dtype, order)
1020 if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
1021 raise ValueError(
-> 1022 "array must not contain infs or NaNs")
1023 return a
1024
ValueError: array must not contain infs or NaNs
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 108 (59 by maintainers)
For anyone affected by this, this should fix it:
Let me know if that doesn’t work for you.
Update: TSNE(perplexity=30, n_components=2, init=‘pca’, n_iter=1000, method=‘exact’) make it worked … method=‘exact’ was the trick.
Sorry, but I still get this on Python 3.5.1, scikit 0.17, scikit-learn 0.18 (commit 9e913c04d748), and Numpy 1.11.1 on Mac OS 10.11.5.
@act65 we are more than keen to get to the bottom of this but we haven’t been able to reproduce and it seems like we are getting mixed reports from users so far unfortunately.
So if you haven’t already (unfortunately we are not mind readers and “not working for me” does not tell us what you tried) could you try to run the snippet mentioned above in https://github.com/scikit-learn/scikit-learn/issues/6665#issuecomment-243782185. Try to run it multiple times just in case because the random seed is not set properly and there may be some randomness left in the snippet.
Then what would be really great if you could try with the 0.18 release candidate which is straightforward to install (highly recommended to do it in a separate virtualenv or conda env):
Edited: 0.18 has been released so you can just use (no need to use
--pre
):and re-run the snippet to see whether it is fixed in 0.18 as some users have reported in this thread already.
0.18 is going to be released in a few weeks if not days so you know what you have to do if you want to help us to get to the bottom of this before the release 😉.
Hi, I read the above comments and can reproduce this. I re-ran code from a few weeks ago and now this issue appears. Here’s a minimal example that now reproduces this issue:
And the output of
is
Again, changing the method to exact (
TSNE(method='exact')
) gets rid of the error.More generally, I have noticed wildly different results when using sklearn’s TSNE (with identitical perplexity and other parameters) from the bh implementation published by Laurens van der Maaten and the MATLAB version. I wonder if there may be a connection?
Have you read https://github.com/scikit-learn/scikit-learn/issues/6665#issuecomment-264029983 and https://github.com/scikit-learn/scikit-learn/issues/6665#issuecomment-264087057 ?
The only way I managed to reproduce this problem was to install numpy with both pip and conda in the same conda environment. If you create a conda environment from scratch you should not have this problem.
In case your problem do not seem to match this description, please post the exact commands you ran to create your conda environment, so we can try to reproduce.
Interesting. I think it has nothing to do with tensorflow; my guess is that
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
vs
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
is the culprit!?
I managed to find a way to reproduce I think by installing the numpy wheel and then scikit-learn via conda on top of it (got the hint from the
conda list
output in https://github.com/scikit-learn/scikit-learn/issues/6665#issuecomment-262800762 where twonumpy
are listed).then execute the snippet from https://github.com/scikit-learn/scikit-learn/issues/6665#issuecomment-262800762.
So it seems like this is happening when mixing numpy installed via pip and conda. In my book this is never a good idea to mix pip and conda for a given package but I guess this can happen without realizing it quite easily (for example you install a project that depends on numpy via pip, and then scikit-learn via conda).
Why this exactly happens I don’t know … and it seems to happen only on OSX by the way (i.e. not on my Ubuntu box).
Hm, interesting, so it’s not a conda issue after all then … Curious why it works for me now 😕 (all I can think that has changed (except for reinstalling conda) was rebooting 😛)
Analogically fails for low points’ values
Reopen, please
I installed master, the code snippet runs cleanly now.
Do you mind sharing your data X with me?