transformers: Windows: No matching distribution found for lightning_base

šŸ› Bug

I followed the seq2seq readme and wanted to try the sshleifer/distilbart-cnn-12-6 model for absractive text summarization. I got the bug above, it seems like lightning_base was part of this project before it was moved/removed.

Information

Model I am using: sshleifer/distilbart-cnn-12-6

Language I am using the model on: English

The problem arises when using:

  • the official example scripts: (give details below)

The tasks I am working on is:

  • my own task or dataset: (give details below) Cnn_dm

To reproduce

Steps to reproduce the behavior:

  1. Follow the instructions in the readme and prepare your environment & oull the latest master
  2. Start summarization by using ./finetune.sh \ --data_dir $CNN_DIR \ --train_batch_size=1 \ --eval_batch_size=1 \ --output_dir=xsum_results \ --num_train_epochs 1 \ --model_name_or_path facebook/bart-large
  3. Receive the error

Expected behavior

I would expect the model to start inference

Environment info

  • transformers version: 2.11.0

  • Platform: Windows-10-10.0.18362-SP0

  • Python version: 3.7.7

  • PyTorch version (GPU?): 1.5.1 (True)

  • Tensorflow version (GPU?): 2.1.0 (True)

  • Using GPU in script?: Yes

  • Using distributed or parallel set-up in script?: no

    @sshleifer, you asked for beeing tagged on issues in the readme

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 16 (10 by maintainers)

Most upvoted comments

Before you start running these commands: it probably end up not working due to Mecab issues (see bottom).

  • FAISS is currently not supported on Windows (though it does have an open project): remove from requirements
  • instead of wget you can use Invoke-WebRequest on Powershell (after having cd’d into examples/seq2seq):
Invoke-WebRequest https://s3.amazonaws.com/datasets.huggingface.co/summarization/xsum.tar.gz -OutFile xsum.tar.gz
  • To add the environment variable:
$env:Path += ";" + (Join-Path -Path (Get-Item .).FullName -ChildPath "xsum")

Rather than using the bash file (finetune.sh), I suggest that you open it and copy-paste the Python command that is in there, including the options that are already present, and add your own options after it (things like data dir, model name).

Before running the command, add ../ to PYTHONPATH:

$env:PythonPath += ";../"

After all that, you will probably still run into a problem involving mecab. It is used for Japanese tokenisation, and it is not easy to disable (it’s also part of sacrebleu). Mecab has a new v1.0 release that works on Windows, however, it includes breaking changes in the rest of transformers as well as sacrebleu. This is unfortunate because such a small change means that many functionalities or examples cannot be used on Windows, even if you do not use Japanese. This is due to the nature of import. I’d rather have that these libraries are only imported when they are needed to maximise cross-platform usage.

I had the same issue as you described @MichaelJanz, also on Windows10 with python 3.7.7.

To clarify my setup, I followed the instructions under ā€œImportant Noteā€ on the transformers/examples page and got an error that faiss was unable to be installed (faiss only supports Linux and MacOS currently I think). I removed faiss from the list of requirements at examples/requirements.txt and ran the example finetune.sh command. Similarly to Michael, I got an error that lightning_base was not found. Since ā€œexportā€ doesn’t work on windows command line, I inserted two lines above the lightning_base import in finetune.py:

import sys
sys.path.insert(0, r'C:\Users\chris\transformers\examples

This solved the issue that lightning_base wasn’t found, but I encountered a new error:

File "finetune.py", line 17, in <module>
    from lightning_base import BaseTransformer, add_generic_args, generic_train
...
  File "C:\Users\chris\transformers\env\lib\site-packages\tokenizers\__init__.py", line 17, in <module>
    from .tokenizers import Tokenizer, Encoding, AddedToken
ModuleNotFoundError: No module named 'tokenizers.tokenizers'

Looking at the tokenizers package installed, I didn’t see an additional folder labeled ā€œtokenizersā€. The tokenizers version I have within my virtual environment is tokenizers==0.8.0rc4. @sshleifer , could you let me know what version of tokenizers you have in your environment? Let me know if you have any other suggestions about what might be happening (I worry that the problem lies with using Windows).

Edit: for context, I tried running the finetuning script within a Linux environment and had no problems, with the same tokenizers==0.8.0rc4 version. I’m guessing that this whole issue is a Windows problem.

try export PYTHONPATH="../":"${PYTHONPATH}" More info in examples/seq2seq/README.md