scikit-learn: PCA segfaults (on some machines)

Description

PCA crashes with segmentation fault at even small sized datasets. Depends on array size.

Steps/Code to Reproduce

Download tmp.npy.gz gunzip tmp.npy.gz

from sklearn.decomposition import PCA
import numpy as np
traindata = np.load('./tmp.npy')
pca = PCA(n_components=5)
x = pca.fit_transform(traindata[0:40,:]) # Crashes
x = pca.fit_transform(traindata[10:40,:]) # Doesn't crash
x = pca.fit_transform(traindata[0:40,0:20])  # Doesn't crash

Expected Results

No segfault

Actual Results

Segfault

Versions

Linux-3.10.0-229.14.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core
Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.11.3
SciPy 0.18.1
Scikit-Learn 0.18.1

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 1
  • Comments: 30 (17 by maintainers)

Most upvoted comments

This happens to me only if I’m using the ‘full’ solver on a Macbook.

By looking into the code, the segmentation fault issue actually comes from SVD here.

The solution is to replace the

U, S, V = linalg.svd(X, full_matrices=False)

with

U, S, V = np.linalg.svd(X, full_matrices=False)

Not sure about the detailed differences between the two implementations in numpy and scipy. But patching it to use numpy’s SVD solves the issue in my case.

This approach still solves my issue today, regarding a Segmentation Fault when doing IncrementalPCA.fit()

This happens to me only if I’m using the ‘full’ solver on a Macbook.

By looking into the code, the segmentation fault issue actually comes from SVD here.

The solution is to replace the

U, S, V = linalg.svd(X, full_matrices=False)

with

U, S, V = np.linalg.svd(X, full_matrices=False)

Not sure about the detailed differences between the two implementations in numpy and scipy. But patching it to use numpy’s SVD solves the issue in my case.