scikit-learn: pls (partial least square) algorithms occasionally failing.
Attempting to use the CCA class in pls.py I’m getting some unhelpful errors.
The example given in the documentation is:
from sklearn.pls import PLSCanonical, PLSRegression, CCA
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [3.,5.,4.]]
Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
cca = CCA(n_components=1)
cca.fit(X, Y)
This works okay, and also works okay with the change:
cca = CCA(n_components=2)
However, if X is changed to (change in last value)
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [3.,5.,1.]]
Then the code gives the error:
ValueError: array must not contain infs or NaNs
I apologise if I am just misusing the function, but this error was coming up for me in a less contrived example.
This is using the current master on github.
About this issue
- Original URL
- State: closed
- Created 11 years ago
- Reactions: 2
- Comments: 23 (12 by maintainers)
This issue is not solved. I am encountering this in the latest version of pykalman. What should I do now?
I am getting the similar error when I try to call FastICA on a sparse matrix.
CODE:
from sklearn.decomposition import FastICA import numpy as np
data = np.array([[ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., -1., 1., 1.], [ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., -1., 1., 1.], [ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., -1., 1., 1.]], dtype=np.float32)
ica = FastICA(n_components=5) data_transf = ica.fit_transform(data)
Output: ValueError: array must not contain infs or NaNs
To the best if my knowledge, ICA shall still work on a sparse matrix. Normalization did not solve the problem. All other linear change of basis algorithms (e.g. PCA; SVD, TruncatedSVD) work.
But I suppose the suggestion is that we should have a test for all our estimators that they don’t explode with all-zero columns…
Actually, here is a temporary fix. This should be made into a class, but out of laziness, this is my solution. Hope it helps someone.
`def NonZeroMask(X): mx = np.mean(X,axis=0)+np.std(X,axis=0) return np.where(mx==0,False,True)
notZero_mask = NonZeroMask(trainX)
def apply_mask(X): global notZero_mask return X[:, notZero_mask]
trans = FunctionTransformer(apply_mask)
print([trainX.shape,testX.shape]) trainX = trans.transform(trainX) testX = trans.transform(testX) print([trainX.shape,testX.shape])`
I am having a similar issue (scikit-learn 0.16.1 + NumPy 1.9.3):
–>
I use
2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 16:44:52) [MSC v.1500 64 bit (AMD64)]
.The code mentioned by Padarn that was triggering the same error works fine:
just gives a warning:
but otherwise is ok.