tokenizers: ERROR: Failed building wheel for tokenizers

System Info

I can’t seem to get past this error “ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects” when installing transformers with pip. An ML friend of mine also tried on their own instance and encountered the same problem, tried to help troubleshoot with me and we weren’t able to move past so I think its possibly a recent issue.

I am following the transformers README install instructions step by step, with a venv and pytorch ready to go. Pip is also fully up to date. In this error output one prompt it says is to possibly install a rust compiler - but we both felt this doesn’t seem like the right next step because it usually isn’t required when installing the transformers package and the README has no mention of needing to install a rust compiler.

Thanks in advance! -Blake

Full output below:

command: pip install transformers

Collecting transformers Using cached transformers-4.21.1-py3-none-any.whl (4.7 MB) Requirement already satisfied: tqdm>=4.27 in ./venv/lib/python3.9/site-packages (from transformers) (4.64.0) Requirement already satisfied: huggingface-hub<1.0,>=0.1.0 in ./venv/lib/python3.9/site-packages (from transformers) (0.9.0) Requirement already satisfied: pyyaml>=5.1 in ./venv/lib/python3.9/site-packages (from transformers) (6.0) Requirement already satisfied: regex!=2019.12.17 in ./venv/lib/python3.9/site-packages (from transformers) (2022.8.17) Collecting tokenizers!=0.11.3,<0.13,>=0.11.1 Using cached tokenizers-0.12.1.tar.gz (220 kB) Installing build dependencies … done Getting requirements to build wheel … done Preparing metadata (pyproject.toml) … done Requirement already satisfied: numpy>=1.17 in ./venv/lib/python3.9/site-packages (from transformers) (1.23.2) Requirement already satisfied: packaging>=20.0 in ./venv/lib/python3.9/site-packages (from transformers) (21.3) Requirement already satisfied: filelock in ./venv/lib/python3.9/site-packages (from transformers) (3.8.0) Requirement already satisfied: requests in ./venv/lib/python3.9/site-packages (from transformers) (2.26.0) Requirement already satisfied: typing-extensions>=3.7.4.3 in ./venv/lib/python3.9/site-packages (from huggingface-hub<1.0,>=0.1.0->transformers) (4.3.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in ./venv/lib/python3.9/site-packages (from packaging>=20.0->transformers) (3.0.9) Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./venv/lib/python3.9/site-packages (from requests->transformers) (1.26.7) Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.9/site-packages (from requests->transformers) (3.3) Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.9/site-packages (from requests->transformers) (2021.10.8) Requirement already satisfied: charset-normalizer~=2.0.0 in ./venv/lib/python3.9/site-packages (from requests->transformers) (2.0.7) Building wheels for collected packages: tokenizers Building wheel for tokenizers (pyproject.toml) … error error: subprocess-exited-with-error

× Building wheel for tokenizers (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [51 lines of output] running bdist_wheel running build running build_py creating build creating build/lib.macosx-12-arm64-cpython-39 creating build/lib.macosx-12-arm64-cpython-39/tokenizers copying py_src/tokenizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/models copying py_src/tokenizers/models/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/models creating build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders copying py_src/tokenizers/decoders/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders creating build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers copying py_src/tokenizers/normalizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers copying py_src/tokenizers/pre_tokenizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/processors copying py_src/tokenizers/processors/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/processors creating build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers copying py_src/tokenizers/trainers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/byte_level_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/sentencepiece_unigram.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/sentencepiece_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/base_tokenizer.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/char_level_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/bert_wordpiece.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations creating build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/tools/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/tools/visualizer.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers copying py_src/tokenizers/models/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/models copying py_src/tokenizers/decoders/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders copying py_src/tokenizers/normalizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers copying py_src/tokenizers/pre_tokenizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers copying py_src/tokenizers/processors/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/processors copying py_src/tokenizers/trainers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers copying py_src/tokenizers/tools/visualizer-styles.css -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools running build_ext running build_rust error: can’t find Rust compiler

  If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
  
  To update pip, run:
  
      pip install --upgrade pip
  
  and then retry package installation.
  
  If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for tokenizers Failed to build tokenizers ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

command: pip install transformers

Collecting transformers Using cached transformers-4.21.1-py3-none-any.whl (4.7 MB) Requirement already satisfied: tqdm>=4.27 in ./venv/lib/python3.9/site-packages (from transformers) (4.64.0) Requirement already satisfied: huggingface-hub<1.0,>=0.1.0 in ./venv/lib/python3.9/site-packages (from transformers) (0.9.0) Requirement already satisfied: pyyaml>=5.1 in ./venv/lib/python3.9/site-packages (from transformers) (6.0) Requirement already satisfied: regex!=2019.12.17 in ./venv/lib/python3.9/site-packages (from transformers) (2022.8.17) Collecting tokenizers!=0.11.3,<0.13,>=0.11.1 Using cached tokenizers-0.12.1.tar.gz (220 kB) Installing build dependencies … done Getting requirements to build wheel … done Preparing metadata (pyproject.toml) … done Requirement already satisfied: numpy>=1.17 in ./venv/lib/python3.9/site-packages (from transformers) (1.23.2) Requirement already satisfied: packaging>=20.0 in ./venv/lib/python3.9/site-packages (from transformers) (21.3) Requirement already satisfied: filelock in ./venv/lib/python3.9/site-packages (from transformers) (3.8.0) Requirement already satisfied: requests in ./venv/lib/python3.9/site-packages (from transformers) (2.26.0) Requirement already satisfied: typing-extensions>=3.7.4.3 in ./venv/lib/python3.9/site-packages (from huggingface-hub<1.0,>=0.1.0->transformers) (4.3.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in ./venv/lib/python3.9/site-packages (from packaging>=20.0->transformers) (3.0.9) Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./venv/lib/python3.9/site-packages (from requests->transformers) (1.26.7) Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.9/site-packages (from requests->transformers) (3.3) Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.9/site-packages (from requests->transformers) (2021.10.8) Requirement already satisfied: charset-normalizer~=2.0.0 in ./venv/lib/python3.9/site-packages (from requests->transformers) (2.0.7) Building wheels for collected packages: tokenizers Building wheel for tokenizers (pyproject.toml) … error error: subprocess-exited-with-error

× Building wheel for tokenizers (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [51 lines of output] running bdist_wheel running build running build_py creating build creating build/lib.macosx-12-arm64-cpython-39 creating build/lib.macosx-12-arm64-cpython-39/tokenizers copying py_src/tokenizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/models copying py_src/tokenizers/models/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/models creating build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders copying py_src/tokenizers/decoders/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders creating build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers copying py_src/tokenizers/normalizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers copying py_src/tokenizers/pre_tokenizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/processors copying py_src/tokenizers/processors/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/processors creating build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers copying py_src/tokenizers/trainers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/byte_level_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/sentencepiece_unigram.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/sentencepiece_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/base_tokenizer.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/char_level_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/bert_wordpiece.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations creating build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/tools/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/tools/visualizer.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers copying py_src/tokenizers/models/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/models copying py_src/tokenizers/decoders/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders copying py_src/tokenizers/normalizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers copying py_src/tokenizers/pre_tokenizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers copying py_src/tokenizers/processors/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/processors copying py_src/tokenizers/trainers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers copying py_src/tokenizers/tools/visualizer-styles.css -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools running build_ext running build_rust error: can’t find Rust compiler

  If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
  
  To update pip, run:
  
      pip install --upgrade pip
  
  and then retry package installation.
  
  If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for tokenizers Failed to build tokenizers ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

Expected behavior

I would expect transformers library to install without throwing an error when all pre-requisites for installation are met.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 6
  • Comments: 65 (1 by maintainers)

Commits related to this issue

Most upvoted comments

I am on M1 and managed to go around this in the following way: I installed a rust compiler using brew, and then initialized it. brew install rustup rustup-init Then I restarted the console and checked if it is installed: rustc --version . It turned out you also have to setup the path: export PATH="$HOME/.cargo/bin:$PATH"

It works now.

I did brew install rustup rustup-init Then I restarted the console and checked if it is installed: rustc --version . It turned out you also have to setup the path: export PATH=“$HOME/.cargo/bin:$PATH”

and

python3 -m pip install transformers python -m pip install transformers

Are you guys on M1 ? If that’s the case it’s expected unfortunately. (https://github.com/huggingface/tokenizers/issues/932)

If not, what platform are you on ? (OS, hardware, python version ?)

Basically for M1 you need to install from source (for now, fixes coming soon https://github.com/huggingface/tokenizers/pull/1055).

Also the error message says you’re missing a rust compiler, it might be enough to just install the rust compiler: https://www.rust-lang.org/tools/install and maybe the install will go through. (It’s easier if we prebuild those but still).

I met this problem and failed to resolve it with any way mentioned above. However when I downgrade my python version from 3.11 to 3.10, everything went well instead. I hope it could help you.

We already build wheels for Apple Silicon ! Just not python3.8 which isn’t supposed to exist on M1. (only 3.9, 3.10, and 3.11 now)

For what it’s worth: Above setup but Rust version 1.67.1 and -A invalid_reference_casting (via RUSTFLAGS) and it does compile then (haven’t yet got to testing if it actually works, though…).

still error for google colab !git clone https://github.com/huggingface/transformers
&& cd transformers
&& git checkout a3085020ed0d81d4903c50967687192e3101e770 !pip install ./transformers/

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Processing ./transformers Preparing metadata (setup.py) … done Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from transformers==2.3.0) (1.22.4) Collecting tokenizers==0.0.11 (from transformers==2.3.0) Downloading tokenizers-0.0.11.tar.gz (30 kB) Installing build dependencies … done Getting requirements to build wheel … done Preparing metadata (pyproject.toml) … done Collecting boto3 (from transformers==2.3.0) Downloading boto3-1.26.144-py3-none-any.whl (135 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 135.6/135.6 kB 15.5 MB/s eta 0:00:00 Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers==2.3.0) (3.12.0) Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers==2.3.0) (2.27.1) Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from transformers==2.3.0) (4.65.0) Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==2.3.0) (2022.10.31) Collecting sentencepiece (from transformers==2.3.0) Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 51.6 MB/s eta 0:00:00 Collecting sacremoses (from transformers==2.3.0) Downloading sacremoses-0.0.53.tar.gz (880 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 880.6/880.6 kB 53.7 MB/s eta 0:00:00 Preparing metadata (setup.py) … done Collecting botocore<1.30.0,>=1.29.144 (from boto3->transformers==2.3.0) Downloading botocore-1.29.144-py3-none-any.whl (10.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.8/10.8 MB 113.7 MB/s eta 0:00:00 Collecting jmespath<2.0.0,>=0.7.1 (from boto3->transformers==2.3.0) Downloading jmespath-1.0.1-py3-none-any.whl (20 kB) Collecting s3transfer<0.7.0,>=0.6.0 (from boto3->transformers==2.3.0) Downloading s3transfer-0.6.1-py3-none-any.whl (79 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.8/79.8 kB 10.1 MB/s eta 0:00:00 Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==2.3.0) (1.26.15) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==2.3.0) (2022.12.7) Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==2.3.0) (2.0.12) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==2.3.0) (3.4) Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from sacremoses->transformers==2.3.0) (1.16.0) Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from sacremoses->transformers==2.3.0) (8.1.3) Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from sacremoses->transformers==2.3.0) (1.2.0) Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.10/dist-packages (from botocore<1.30.0,>=1.29.144->boto3->transformers==2.3.0) (2.8.2) Building wheels for collected packages: transformers, tokenizers, sacremoses Building wheel for transformers (setup.py) … done Created wheel for transformers: filename=transformers-2.3.0-py3-none-any.whl size=458550 sha256=236e7cf5654e4cff65da41ee3a83e39d34fbea6396b8051e9243120a5cae5dde Stored in directory: /tmp/pip-ephem-wheel-cache-wlywjaz5/wheels/7c/35/80/e946b22a081210c6642e607ed65b2a5b9a4d9259695ee2caf5 error: subprocess-exited-with-error

× Building wheel for tokenizers (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. Building wheel for tokenizers (pyproject.toml) … error ERROR: Failed building wheel for tokenizers Building wheel for sacremoses (setup.py) … done Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895241 sha256=099fd152876aa843c9f04a284c7f7c9260d266b181e672796d1619a0f7e2be76 Stored in directory: /root/.cache/pip/wheels/00/24/97/a2ea5324f36bc626e1ea0267f33db6aa80d157ee977e9e42fb Successfully built transformers sacremoses Failed to build tokenizers ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

I am on M1 and managed to go around this in the following way: I installed a rust compiler using brew, and then initialized it. brew install rustup rustup-init Then I restarted the console and checked if it is installed: rustc --version . It turned out you also have to setup the path: export PATH="$HOME/.cargo/bin:$PATH"

I used this way and it worked for me on M2. Thank you so much

Spent 27 Hours trying to get deepspeed working on a tool to run into this error and be blocked. Tokenizers is already installed, but installing anything else seems to make it try to reinstall. It fails to compile due to a rust issue.

error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
         --> tokenizers-lib/src/models/bpe/trainer.rs:526:47
          |
      522 |                     let w = &words[*i] as *const _ as *mut _;
          |                             -------------------------------- casting happend here
      ...
      526 |                         let word: &mut Word = &mut (*w);
          |                                               ^^^^^^^^^
          |
          = note: `#[deny(invalid_reference_casting)]` on by default

      warning: `tokenizers` (lib) generated 3 warnings
      error: could not compile `tokenizers` (lib) due to previous error; 3 warnings emitted

Albeit this was on WSL2, notorious for failures of a catastrophic degree.

Especially if you are using a recent version of python, highly possible that it won’t be compatible with old versions of transformers

@Teofebano do you need such an ‘old’ version of transformers ? The reason you’re having this issue is that transformers is requiring a version of tokenizers for which there is no MacOS wheel, which is the problem I had if you scroll up, so it builds from source…

Alternatively install rust so it can be built (no I didn’t what to do that either)

@Teofebano can’t you just install the wheels we released online? the following worked for me on a M1, not sure why M2 would be different

conda create -n py3.11 python=3.11
conda activate py3.11
pip install tokenizers

same problem on windows 11 using python 3.11

For windows:

  • Install Visual Studio (latest version, 2022)
  • install Python workloads
  • install Desktop Development C++ Workloads