transformers: Pickle error

While fine-tuning BERT with the new script I am facing the issue as follows:

Traceback (most recent call last):
  File "run_mlm.py", line 310, in <module>
    main()
  File "run_mlm.py", line 259, in main
    load_from_cache_file=not data_args.overwrite_cache,
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/dataset_dict.py", line 300, in map
    for k, dataset in self.items()
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/dataset_dict.py", line 300, in <dictcomp>
    for k, dataset in self.items()
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 1256, in map
    update_data=update_data,
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 156, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/fingerprint.py", line 158, in wrapper
    self._fingerprint, transform, kwargs_for_fingerprint
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/fingerprint.py", line 105, in update_fingerprint
    hasher.update(transform_args[key])
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/fingerprint.py", line 57, in update
    self.m.update(self.hash(value).encode("utf-8"))
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/fingerprint.py", line 53, in hash
    return cls.hash_default(value)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/fingerprint.py", line 46, in hash_default
    return cls.hash_bytes(dumps(value))
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/utils/py_utils.py", line 367, in dumps
    dump(obj, file)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/utils/py_utils.py", line 339, in dump
    Pickler(file, recurse=True).dump(obj)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/dill/_dill.py", line 446, in dump
    StockPickler.dump(self, obj)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 409, in dump
    self.save(obj)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/dill/_dill.py", line 1438, in save_function
    obj.__dict__, fkwdefaults), obj=obj)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 610, in save_reduce
    save(args)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/dill/_dill.py", line 1170, in save_cell
    pickler.save_reduce(_create_cell, (f,), obj=obj)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 610, in save_reduce
    save(args)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 605, in save_reduce
    save(cls)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/dill/_dill.py", line 1365, in save_type
    obj.__bases__, _dict), obj=obj)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 610, in save_reduce
    save(args)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/dill/_dill.py", line 933, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/dill/_dill.py", line 933, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 507, in save
    self.save_global(obj, rv)
  File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 927, in save_global
    (obj, module_name, name))
_pickle.PicklingError: Can't pickle typing.Union[str, NoneType]: it's not the same object as typing.Union


I am trying to run the same script with the already mentioned wikitext dataset. However, I am not able to run it successfully due to the above mentioned error.

@sgugger Could you please help me resolve this error?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 17 (3 by maintainers)

Most upvoted comments

Further reduced, the bug appears in all python versions <= 3.6.12 but disappears in python 3.7.0.

For future reference, here is how I create an env reproducing the bug, and the command that shows it (self-contained to the repo):

pyenv install 3.6.7
pyenv virtualenv 3.6.7 picklebug
pyenv activate picklebug
pip install --upgrade pip
pip install transformers[torch]
pip install datasets
cd git/transformers # Adapt to your local path to the cloned repo
pip install -e .
python examples/language-modeling/run_mlm.py \
--model_name_or_path roberta-base \
--train_file ./tests/fixtures/sample_text.txt \
--validation_file ./tests/fixtures/sample_text.txt \
--do_train \
--do_eval \
--output_dir /tmp/test=clm \
--line_by_line

Thanks, this was really helpful !!!

The bug disappears for me with python 3.7.9 so if you can upgrade your python version, you should be good to go.