spaCy: Tokenization not working using v2.1

How to reproduce the behaviour

I found a bug where tokenization is completely not working with version 2.1.0a10 on python 2.7. I have reproduced this on three of my machines.

$ conda create -n py27_spacy2 python=2.7
$ source activate py27_spacy2
$ pip install -U spacy-nightly
$ python -m spacy download en_core_web_sm
$ python -c "import spacy; nlp=spacy.load('en_core_web_sm'); doc=nlp(u'hello world'); print ','.join([t.text for t in doc])"
h,e,ll,o,w,o,r,l,d

Your Environment

  • Operating System: Ubuntu
  • Python Version Used: 2.7
  • spaCy Version Used: 2.1.0a10

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 29 (10 by maintainers)

Commits related to this issue

Most upvoted comments

>>> from spacy.lang.en import English
>>> nlp = English()
>>> doc=nlp(u'Well I wonder how this will / shall look after tokenization with the model - ill or not ?')
>>> print ','.join([t.text for t in doc])
Well,I,wonder,how,this,will,/,shall,look,after,tokenization,with,the,model,-,ill,or,not,?
>>> import spacy
>>> nlp=spacy.load('en_core_web_sm')
>>> doc=nlp(u'Well I wonder how this will / shall look after tokenization with the model - ill or not ?')
>>> print ','.join([t.text for t in doc])
W,e,l,l,I,w,o,n,d,e,r,h,o,w,t,h,i,s,w,i,l,l,/,s,h,a,l,l,l,o,o,k,a,f,t,e,r,t,o,k,e,n,i,z,a,t,i,o,n,w,i,t,h,t,h,e,m,o,d,e,l,-,i,ll,o,r,n,o,t,?