scikit-learn: pls (partial least square) algorithms occasionally failing.

Attempting to use the CCA class in pls.py I’m getting some unhelpful errors.

The example given in the documentation is:

from sklearn.pls import PLSCanonical, PLSRegression, CCA
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [3.,5.,4.]]
Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
cca = CCA(n_components=1)
cca.fit(X, Y)

This works okay, and also works okay with the change:

cca = CCA(n_components=2)

However, if X is changed to (change in last value)

X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [3.,5.,1.]]

Then the code gives the error:

ValueError: array must not contain infs or NaNs

I apologise if I am just misusing the function, but this error was coming up for me in a less contrived example.

This is using the current master on github.

About this issue

  • Original URL
  • State: closed
  • Created 11 years ago
  • Reactions: 2
  • Comments: 23 (12 by maintainers)

Most upvoted comments

This issue is not solved. I am encountering this in the latest version of pykalman. What should I do now?

I am getting the similar error when I try to call FastICA on a sparse matrix.

CODE:


from sklearn.decomposition import FastICA import numpy as np

data = np.array([[ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., -1., 1., 1.], [ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., -1., 1., 1.], [ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., -1., 1., 1.]], dtype=np.float32)

ica = FastICA(n_components=5) data_transf = ica.fit_transform(data)


Output: ValueError: array must not contain infs or NaNs

To the best if my knowledge, ICA shall still work on a sparse matrix. Normalization did not solve the problem. All other linear change of basis algorithms (e.g. PCA; SVD, TruncatedSVD) work.

But I suppose the suggestion is that we should have a test for all our estimators that they don’t explode with all-zero columns…

Actually, here is a temporary fix. This should be made into a class, but out of laziness, this is my solution. Hope it helps someone.

`def NonZeroMask(X): mx = np.mean(X,axis=0)+np.std(X,axis=0) return np.where(mx==0,False,True)

notZero_mask = NonZeroMask(trainX)

def apply_mask(X): global notZero_mask return X[:, notZero_mask]

trans = FunctionTransformer(apply_mask)

print([trainX.shape,testX.shape]) trainX = trans.transform(trainX) testX = trans.transform(testX) print([trainX.shape,testX.shape])`

I am having a similar issue (scikit-learn 0.16.1 + NumPy 1.9.3):

import numpy as np
import sklearn.cross_decomposition

pls2 = sklearn.cross_decomposition.PLSRegression()
xx = np.random.random((5,5))
yy = np.zeros((5,5) ) 

yy[0,:] = [0,1,0,0,0]
yy[1,:] = [0,0,0,1,0]
yy[2,:] = [0,0,0,0,1]
#yy[3,:] = [1,0,0,0,0] # Uncommenting this line solves the issue

pls2.fit(xx, yy)

–>

C:\Anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:44: RuntimeWarning: invalid value encountered in divide
  x_weights = np.dot(X.T, y_score) / np.dot(y_score.T, y_score)
C:\Anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:64: RuntimeWarning: invalid value encountered in less
  if np.dot(x_weights_diff.T, x_weights_diff) < tol or Y.shape[1] == 1:
C:\Anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:67: UserWarning: Maximum number of iterations reached
  warnings.warn('Maximum number of iterations reached')
C:\Anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:297: RuntimeWarning: invalid value encountered in less
  if np.dot(x_scores.T, x_scores) < np.finfo(np.double).eps:
C:\Anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:275: RuntimeWarning: invalid value encountered in less
  if np.all(np.dot(Yk.T, Yk) < np.finfo(np.double).eps):
Traceback (most recent call last):
  File "C:\svn\hw4\code\test_plsr3.py", line 14, in <module>
    pls2.fit(xx, yy)
  File "C:\Anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py", line 335, in fit
    linalg.pinv(np.dot(self.x_loadings_.T, self.x_weights_)))
  File "C:\Anaconda\lib\site-packages\scipy\linalg\basic.py", line 889, in pinv
    a = _asarray_validated(a, check_finite=check_finite)
  File "C:\Anaconda\lib\site-packages\scipy\_lib\_util.py", line 135, in _asarray_validated
    a = np.asarray_chkfinite(a)
  File "C:\Anaconda\lib\site-packages\numpy\lib\function_base.py", line 613, in asarray_chkfinite
    "array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs

I use 2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 16:44:52) [MSC v.1500 64 bit (AMD64)].


The code mentioned by Padarn that was triggering the same error works fine:

import sklearn.cross_decomposition
X = [[1., 1., 2.], [3., 1., 4.], [5., 1., 6.], [2., 2., 2.]]
Y = [[1., 1.], [2., 1.], [3., 1.], [1., 1.]]
cca = sklearn.cross_decomposition.CCA()
cca.fit(X, Y)

just gives a warning:

C:\Anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:277: UserWarning: Y residual constant at iteration 1
  warnings.warn('Y residual constant at iteration %s' % k)

but otherwise is ok.