scikit-learn: segmentation fault (core dump)

I am using the TfifVectorizer class. When running the third line of code below:

from sklearn.feature_extraction.text import TfidfVectorizer
tfv = TfidfVectorizer(min_df=20,  max_features=None,
    strip_accents='unicode', analyzer='word',token_pattern=r'\b[^\W\d_]+\b',
    ngram_range=(1, 5), use_idf=1,smooth_idf=1,sublinear_tf=1,
    stop_words = STOPWORDS)
text_tf_idf = tfv.fit_transform(s_data)

I get:

Segmentation fault (core dumped)

Based on the research I have done on previous similar questions, you get this error in C++ or C “when you are doing something wrong with memory”.

However, I am a bit confused because in Python you do not allocate variables to memory directly - and I was under the impression that the class TfifVectorizer is implemented entirely in Python. Additionally, I ran the same code on Windows 8/10 64 bits (Python 3.4) with no issues.

Now I am running the code on Linux Ubuntu (Python 2.7).

Why is this happening and how can it be solved?

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

I am also facing the same problem. If I try to run this statement in python 2.7 from sklearn.feature_extraction.text import CountVectorizer I am getting the message segmentation fault (core dump)

But this works fine in python 3.5