scikit-learn: pca.fit_transform returns error: array must not contain infs or NaNs

When I call PCA’s fit_transform method I am getting error array must not contain infs or NaNs

Actual code: sklearn_pca = PCA(n_components = 3) input_vec = sklearn_pca.fit_transform(normalised_tfidf)

Traceback (most recent call last):

  File "C:\Users\User\workspace\caseStudy\main.py", line 135, in <module>
    input_vec = sklearn_pca.fit_transform(normalised_tfidf)

  File "C:\Users\User\anaconda3\lib\site-packages\sklearn\decomposition\_pca.py", line 369, in fit_transform
    U, S, V = self._fit(X)

  File "C:\Users\User\anaconda3\lib\site-packages\sklearn\decomposition\_pca.py", line 418, in _fit
    return self._fit_truncated(X, n_components, self._fit_svd_solver)

  File "C:\Users\User\anaconda3\lib\site-packages\sklearn\decomposition\_pca.py", line 532, in _fit_truncated
    U, S, V = randomized_svd(X, n_components=n_components,

  File "C:\Users\User\anaconda3\lib\site-packages\sklearn\utils\extmath.py", line 354, in randomized_svd
    Uhat, s, V = linalg.svd(B, full_matrices=False)

  File "C:\Users\User\anaconda3\lib\site-packages\scipy\linalg\decomp_svd.py", line 109, in svd
    a1 = _asarray_validated(a, check_finite=check_finite)

  File "C:\Users\User\anaconda3\lib\site-packages\scipy\_lib\_util.py", line 246, in _asarray_validated
    a = toarray(a)

  File "C:\Users\User\anaconda3\lib\site-packages\numpy\lib\function_base.py", line 498, in asarray_chkfinite
    raise ValueError(

ValueError: array must not contain infs or NaNs

Checked if there are infs and NaNs in the input array: np.any(np.isnan(normalised_tfidf)) Out[2]: False

np.any(np.isinf(normalised_tfidf)) Out[3]: False

Versions: Python: 3.8 Anaconda: 1.9.12 Sklearn: 0.23.1

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (2 by maintainers)

Most upvoted comments

Issue resolved. Can not reproduce anymore

Issue resolved. Can not reproduce anymore

Hello,

can you share the solution, pls? As from sklearn > 0.22.1 I have the same issue with many random datasets which never produced such errors before.

Oddly enough, if I add a simple loop e.g.: try: x_pca = pca.fit_transform(PCA_data) except: x_pca = pca.fit_transform(PCA_data)

then with the second run it ALWAYS passes without error…

As I saw people discussing this issue on forums, I believe it is worthy to solve it in a general way.

@MichalRIcar I just did and it still doesn’t work 😕

May you provide the dataset that caused the issue so that we can reproduce it?