scikit-learn: PCA segfaults (on some machines)
Description
PCA crashes with segmentation fault at even small sized datasets. Depends on array size.
Steps/Code to Reproduce
Download tmp.npy.gz
gunzip tmp.npy.gz
from sklearn.decomposition import PCA
import numpy as np
traindata = np.load('./tmp.npy')
pca = PCA(n_components=5)
x = pca.fit_transform(traindata[0:40,:]) # Crashes
x = pca.fit_transform(traindata[10:40,:]) # Doesn't crash
x = pca.fit_transform(traindata[0:40,0:20]) # Doesn't crash
Expected Results
No segfault
Actual Results
Segfault
Versions
Linux-3.10.0-229.14.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core
Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul 2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.11.3
SciPy 0.18.1
Scikit-Learn 0.18.1
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 1
- Comments: 30 (17 by maintainers)
This approach still solves my issue today, regarding a Segmentation Fault when doing
IncrementalPCA.fit()
This happens to me only if I’m using the ‘full’ solver on a Macbook.
By looking into the code, the segmentation fault issue actually comes from SVD here.
The solution is to replace the
with
Not sure about the detailed differences between the two implementations in numpy and scipy. But patching it to use numpy’s SVD solves the issue in my case.