tokenizers: wrong architecture `tokenizers.cpython-39-darwin.so` (x86_64) when installing on apple silicon (arm64)
Hey there,
I just wanted to share an issue I came by when trying to get the transformers quick tour example working on my machine.
It seems like, currently, installing tokenizers
via pypi builds or bundles the tokenizers.cpython-39-darwin.so
automatically for x86_64
instead of arm64
for users with apple silicon m1 computers.
System Info: Macbook Air M1 2020 with Mac OS 11.0.1
To reproduce:
-
create virtualenv
virtualenv venv-bad
and activate itsource venv-bad/bin/activate
-
install pytorch (easiest way i’ve found so far on arm64 is to install nightly via
pip
)pip install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
-
install transformers (tokenizers will be installed as a dependency)
pip install transformers
-
create a file with quick tour example:
main.py
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
classifier('We are very happy to show you the 🤗 Transformers library.')
- try running quick tour example
Results in error:
ImportError: dlopen(/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so, 2): no suitable image found. Did find:
/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
Full stacktrace
(venv-bad) khuynh@kmba:test ‹main*›$ python main.py Traceback (most recent call last): File "/Users/khuynh/me/test/temp.py", line 5, in <module> from transformers import pipeline File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/__init__.py", line 2709, in __getattr__ return super().__getattr__(name) File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/file_utils.py", line 1821, in __getattr__ module = self._get_module(self._class_to_module[name]) File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/__init__.py", line 2703, in _get_module return importlib.import_module("." + module_name, self.__name__) File "/opt/homebrew/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/pipelines/__init__.py", line 25, in <module> from ..models.auto.configuration_auto import AutoConfig File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/__init__.py", line 19, in <module> from . import ( File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/layoutlm/__init__.py", line 23, in <module> from .tokenization_layoutlm import LayoutLMTokenizer File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/layoutlm/tokenization_layoutlm.py", line 19, in <module> from ..bert.tokenization_bert import BertTokenizer File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/bert/tokenization_bert.py", line 23, in <module> from ...tokenization_utils import PreTrainedTokenizer, _is_control, _is_punctuation, _is_whitespace File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 26, in <module> from .tokenization_utils_base import ( File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 69, in <module> from tokenizers import AddedToken File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/__init__.py", line 79, in <module> from .tokenizers import ( ImportError: dlopen(/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so, 2): no suitable image found. Did find: /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
Looking at the architecture of the shared lib using find
, we can see it’s a dynamically linked x86_64 library
(venv-bad) khuynh@kmba:test ‹main*›$ file /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so
/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64
Solution:
The solution I found requires installing the rust toolchain on your machine and installing the tokenizers
module from source so I think this is best as a temporary solution. I already have the rust nightly toolchain installed on my machine, so that’s what I used. Otherwise, instructions for installing are here.
- clone
tokenizers
git clone git@github.com:huggingface/tokenizers.git
cd tokenizers/bindings/python
- install tokenizers,
python setup.py install
- now go back and successfully re-run the transformers quick tour
We can also now see that the shared library is the proper architecture using file
:
(venv-bad) khuynh@kmba:test ‹main*›$ file /Users/khuynh/me/test/venv2/lib/python3.9/site-packages/tokenizers-0.10.2-py3.9-macosx-11-arm64.egg/tokenizers/tokenizers.cpython-39-darwin.so
/Users/khuynh/me/test/venv2/lib/python3.9/site-packages/tokenizers-0.10.2-py3.9-macosx-11-arm64.egg/tokenizers/tokenizers.cpython-39-darwin.so: Mach-O 64-bit dynamically linked shared library arm64
I’m not super well versed in setuptools, so I’m not sure best way to fix this. Maybe release a different pre-built shared tokenizers.cpython-39-darwin.so
for arm64
users? I’d be happy to help if needed.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 6
- Comments: 19 (3 by maintainers)
Hi there !
I’ve manually build binaries for tokenizers on arm m1 and released them for tokenizers
0.11.6
.We’ll try our best to keep building those by hand while waiting for https://github.com/actions/runner/issues/805.
Expect some delay between normal releases and m1 releases for now 😃
Have a great day !
Hi @hkennyv and thank you for reporting this.
We don’t build wheels for Apple Silicon at the moment because there is no environment for this on our Github CI. (cf https://github.com/actions/virtual-environments/issues/2187). The only way to have it working is, as you mentioned, to build it yourself. We’ll add support for this as soon as it is available!
That means you need to this command I had to do to change the default host:
It may also be defaulting to the wrong toolchain. You might also try setting the default toolchain with
I think I also had to delete rust-toolchain as when it was present it would change to the x86_64 toolchain. You can check to make sure the right one is selected with
Edit: I was able to fix the rust-toolchain issue by doing
I ran into this as well. It turned out that I was using the brew installed rust rather than the rustup one. Try
which rustc
to make sure it is coming from the ~/.cargo directory.I have followed the instructions to build from source, and I still see the library be x86_64 compiled.
I cloned the repo, made sure the Python environment is configured for shared library, and ran
python setup.py install
.tokenizers
was installed in the virtual environment.Ran the following command to check the built compiled lib.
Output:
I do not understand why it is not compiling for the correct target.
I am on a M1 Macbook pro. Python version is 3.10.7. Cargo version is 1.63.0.