tokenizers: wrong architecture `tokenizers.cpython-39-darwin.so` (x86_64) when installing on apple silicon (arm64)

Hey there,

I just wanted to share an issue I came by when trying to get the transformers quick tour example working on my machine.

It seems like, currently, installing tokenizers via pypi builds or bundles the tokenizers.cpython-39-darwin.so automatically for x86_64 instead of arm64 for users with apple silicon m1 computers.

System Info: Macbook Air M1 2020 with Mac OS 11.0.1

To reproduce:

  1. create virtualenv virtualenv venv-bad and activate it source venv-bad/bin/activate

  2. install pytorch (easiest way i’ve found so far on arm64 is to install nightly via pip) pip install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

  3. install transformers (tokenizers will be installed as a dependency) pip install transformers

  4. create a file with quick tour example:

main.py

from transformers import pipeline
classifier = pipeline('sentiment-analysis')

classifier('We are very happy to show you the 🤗 Transformers library.')
  1. try running quick tour example

Results in error:

ImportError: dlopen(/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so, 2): no suitable image found.  Did find:
        /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
        /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
Full stacktrace
(venv-bad) khuynh@kmba:test ‹main*›$ python main.py
Traceback (most recent call last):
  File "/Users/khuynh/me/test/temp.py", line 5, in <module>
    from transformers import pipeline
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/__init__.py", line 2709, in __getattr__
    return super().__getattr__(name)
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/file_utils.py", line 1821, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/__init__.py", line 2703, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/opt/homebrew/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/pipelines/__init__.py", line 25, in <module>
    from ..models.auto.configuration_auto import AutoConfig
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/__init__.py", line 19, in <module>
    from . import (
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/layoutlm/__init__.py", line 23, in <module>
    from .tokenization_layoutlm import LayoutLMTokenizer
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/layoutlm/tokenization_layoutlm.py", line 19, in <module>
    from ..bert.tokenization_bert import BertTokenizer
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/bert/tokenization_bert.py", line 23, in <module>
    from ...tokenization_utils import PreTrainedTokenizer, _is_control, _is_punctuation, _is_whitespace
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 26, in <module>
    from .tokenization_utils_base import (
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 69, in <module>
    from tokenizers import AddedToken
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/__init__.py", line 79, in <module>
    from .tokenizers import (
ImportError: dlopen(/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so, 2): no suitable image found.  Did find:
        /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
        /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture

Looking at the architecture of the shared lib using find, we can see it’s a dynamically linked x86_64 library

(venv-bad) khuynh@kmba:test ‹main*›$ file /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so
/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64

Solution:

The solution I found requires installing the rust toolchain on your machine and installing the tokenizers module from source so I think this is best as a temporary solution. I already have the rust nightly toolchain installed on my machine, so that’s what I used. Otherwise, instructions for installing are here.

  1. clone tokenizers
git clone git@github.com:huggingface/tokenizers.git
  1. cd tokenizers/bindings/python
  2. install tokenizers, python setup.py install
  3. now go back and successfully re-run the transformers quick tour

We can also now see that the shared library is the proper architecture using file:

(venv-bad) khuynh@kmba:test ‹main*›$ file /Users/khuynh/me/test/venv2/lib/python3.9/site-packages/tokenizers-0.10.2-py3.9-macosx-11-arm64.egg/tokenizers/tokenizers.cpython-39-darwin.so
/Users/khuynh/me/test/venv2/lib/python3.9/site-packages/tokenizers-0.10.2-py3.9-macosx-11-arm64.egg/tokenizers/tokenizers.cpython-39-darwin.so: Mach-O 64-bit dynamically linked shared library arm64

I’m not super well versed in setuptools, so I’m not sure best way to fix this. Maybe release a different pre-built shared tokenizers.cpython-39-darwin.so for arm64 users? I’d be happy to help if needed.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 6
  • Comments: 19 (3 by maintainers)

Most upvoted comments

Hi there !

I’ve manually build binaries for tokenizers on arm m1 and released them for tokenizers 0.11.6.

We’ll try our best to keep building those by hand while waiting for https://github.com/actions/runner/issues/805.

Expect some delay between normal releases and m1 releases for now 😃

Have a great day !

Hi @hkennyv and thank you for reporting this.

We don’t build wheels for Apple Silicon at the moment because there is no environment for this on our Github CI. (cf https://github.com/actions/virtual-environments/issues/2187). The only way to have it working is, as you mentioned, to build it yourself. We’ll add support for this as soon as it is available!

I get this when I run it in tokenizers/bindings/python:

stable-aarch64-apple-darwin (default)
stable-x86_64-apple-darwin (override)

That means you need to this command I had to do to change the default host:

rustup set default-host aarch64-apple-darwin

It may also be defaulting to the wrong toolchain. You might also try setting the default toolchain with

rustup default stable-aarch64-apple-darwin

I think I also had to delete rust-toolchain as when it was present it would change to the x86_64 toolchain. You can check to make sure the right one is selected with

rustup toolchain list

Edit: I was able to fix the rust-toolchain issue by doing

rustup set default-host aarch64-apple-darwin

I am on a M1 Macbook pro. Python version is 3.10.7. Cargo version is 1.63.0.

I ran into this as well. It turned out that I was using the brew installed rust rather than the rustup one. Try which rustc to make sure it is coming from the ~/.cargo directory.

I have followed the instructions to build from source, and I still see the library be x86_64 compiled.

I cloned the repo, made sure the Python environment is configured for shared library, and ran python setup.py install.

tokenizers was installed in the virtual environment.

Ran the following command to check the built compiled lib.

file .venv/lib/python3.10/site-packages/tokenizers-0.13.0.dev0-py3.10-macosx-12.2-arm64.egg/tokenizers/tokenizers.cpython-310-darwin.so

Output:

.venv/lib/python3.10/site-packages/tokenizers-0.13.0.dev0-py3.10-macosx-12.2-arm64.egg/tokenizers/tokenizers.cpython-310-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64

I do not understand why it is not compiling for the correct target.

I am on a M1 Macbook pro. Python version is 3.10.7. Cargo version is 1.63.0.