scikit-learn: segmentation fault (core dump)
I am using the TfifVectorizer class. When running the third line of code below:
from sklearn.feature_extraction.text import TfidfVectorizer
tfv = TfidfVectorizer(min_df=20, max_features=None,
strip_accents='unicode', analyzer='word',token_pattern=r'\b[^\W\d_]+\b',
ngram_range=(1, 5), use_idf=1,smooth_idf=1,sublinear_tf=1,
stop_words = STOPWORDS)
text_tf_idf = tfv.fit_transform(s_data)
I get:
Segmentation fault (core dumped)
Based on the research I have done on previous similar questions, you get this error in C++ or C “when you are doing something wrong with memory”.
However, I am a bit confused because in Python you do not allocate variables to memory directly - and I was under the impression that the class TfifVectorizer is implemented entirely in Python. Additionally, I ran the same code on Windows 8/10 64 bits (Python 3.4) with no issues.
Now I am running the code on Linux Ubuntu (Python 2.7).
Why is this happening and how can it be solved?
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 15 (7 by maintainers)
I am also facing the same problem. If I try to run this statement in python 2.7
from sklearn.feature_extraction.text import CountVectorizer
I am getting the messagesegmentation fault (core dump)
But this works fine in python 3.5