spaCy: make_doc becomes extremely slow after some upgrade of MacOs
Spacy: 2.3.0 MacPro: BigSur 11.2.3
After some update of my MacOS, for various reasons I had to reinstall xcode command line tools, also used pyenv to install python 3.7.8, to make it co-exists with Python 3.8.
Now the problem is that I ran the same program developed on my Spacy project, and the loading dictionary part becomes extremely slow. 10000 entries takes several minutes, which used to take seconds. No any error or warning messages.
doc = Doc(self.vocab, words=words, spaces=spaces)
To test that it is not due to tokenization issue, I hard coded the words and spaces list before calling the above line:
words = ['test']
spaces=[False]
It is the same problem. I suspect some Mac system environment interact with some spacy Cython code and caused the degradation. But we have used Spacy for quite some time and this is the first time to observe this phenoninum. Strange. My gcc version:
$ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 12.0.0 (clang-1200.0.32.29)
Target: x86_64-apple-darwin20.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
I am thinking to reinstall my Mac and recompile spacy to see whether the issue go away. This way all the recent updates/upgrades to MacOS and development tools will be removed.
Has anybody encountered similar issues?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 23 (8 by maintainers)
After some more digging, I can confirm that the trouble maker is the ‘importlib-metadata’ versions. In one environment in which importlib-medata 3.7.3 is installed, the speed is normal, but in another environment, importlib-metadata 3.10.0 is installed, the speed is abnormal. In both cases, I didn’t list importlib-metadata in my requirements.txt and they were installed automatically due to other dependencies. These two environments install different dependencies, so the version could be different.
I haven’t specified importlib-metadata in my requirements.txt in the past and it has worked. The sudden appearance of this speed issue is likely due to the changing nature of pip itself, when ‘pip install --upgrade pip’. For example, I didnt’ get this warning in the past, but now has seen it:
warning: /Users/congminmin/nlp/wb-nlp-tool/spaCy-wb/spacy/vocab.pxd:29:10: cpdef variables will not be supported in Cython 3; currently they are no different from cdef variablesThe attached requirements.txt is what I am using. If you remove ‘importlib-metadata==3.7.3’ and let 3.10.0 be installed automatically on my Mac and Linux as well, the issue can be reproduced in my project. I didn’t test the original spacy without any modification. I had minor tweak on Spacy in my project, and the problem can be reproduced as described. So the solution for me is to specify the importlib-metadata’s version spcifically in requirement.txt.
On Mac, I am testing on Python 3.7.10, and on Linux, it’s Python 3.6.8
requirements.txt