spaCy: Tokenization not working using v2.1
How to reproduce the behaviour
I found a bug where tokenization is completely not working with version 2.1.0a10 on python 2.7. I have reproduced this on three of my machines.
$ conda create -n py27_spacy2 python=2.7
$ source activate py27_spacy2
$ pip install -U spacy-nightly
$ python -m spacy download en_core_web_sm
$ python -c "import spacy; nlp=spacy.load('en_core_web_sm'); doc=nlp(u'hello world'); print ','.join([t.text for t in doc])"
h,e,ll,o,w,o,r,l,d
Your Environment
- Operating System: Ubuntu
- Python Version Used: 2.7
- spaCy Version Used: 2.1.0a10
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 29 (10 by maintainers)
Commits related to this issue
- Add failing test for #3356 — committed to explosion/spaCy by honnibal 5 years ago
- Fix tokenizer on Python2.7 spaCy v2.1 switched to the built-in re module, where v2.0 had been using the third-party regex library. When the tokenizer was deserialized on Python2.7, the `re.compile()`... — committed to explosion/spaCy by honnibal 5 years ago
- Fix tokenizer on Python2.7 (#3460) spaCy v2.1 switched to the built-in re module, where v2.0 had been using the third-party regex library. When the tokenizer was deserialized on Python2.7, the `re.... — committed to explosion/spaCy by honnibal 5 years ago
- fix (#1) * Add failing test for #3356 * Fix test that caused pytest to choke on Python3 * adding kb_id as field to token, el as nlp pipeline component * annotate kb_id through ents in doc ... — committed to kiku-jw/spaCy by kiku-jw 5 years ago
Most upvoted comments
+2
rulai-huajunzeng on Mar 7, 2019