datasets: File exists error when used with TPU
Hi,
I’m getting a “File exists” error when I use text dataset for pre-training a RoBERTa model using transformers
(3.0.2) and nlp
(0.4.0) on a VM with TPU (v3-8).
I modified line 131 in the original run_language_modeling.py
as follows:
# line 131: return LineByLineTextDataset(tokenizer=tokenizer, file_path=file_path, block_size=args.block_size)
dataset = load_dataset("text", data_files=file_path, split="train")
dataset = dataset.map(lambda ex: tokenizer(ex["text"], add_special_tokens=True,
truncation=True, max_length=args.block_size), batched=True)
dataset.set_format(type='torch', columns=['input_ids'])
return dataset
When I run this with xla_spawn.py
, I get the following error (it produces one message per core in TPU, which I believe is fine).
It seems the current version doesn’t take into account distributed training processes as in this example?
08/25/2020 13:59:41 - WARNING - nlp.builder - Using custom data configuration default
08/25/2020 13:59:43 - INFO - nlp.builder - Generating dataset text (/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d)
08/25/2020 13:59:43 - INFO - nlp.builder - Generating dataset text (/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d)
08/25/2020 13:59:43 - INFO - nlp.builder - Generating dataset text (/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d)
08/25/2020 13:59:43 - INFO - nlp.builder - Generating dataset text (/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d)
08/25/2020 13:59:43 - INFO - nlp.builder - Generating dataset text (/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d)
08/25/2020 13:59:43 - INFO - nlp.builder - Generating dataset text (/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d)
08/25/2020 13:59:43 - INFO - nlp.builder - Generating dataset text (/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d)
08/25/2020 13:59:43 - INFO - nlp.builder - Generating dataset text (/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d)
Downloading and preparing dataset text/default-b0932b2bdbb63283 (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/
447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d...
Downloading and preparing dataset text/default-b0932b2bdbb63283 (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/
447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d...
Downloading and preparing dataset text/default-b0932b2bdbb63283 (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/
447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d...
Downloading and preparing dataset text/default-b0932b2bdbb63283 (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/
447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d...
Downloading and preparing dataset text/default-b0932b2bdbb63283 (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/
447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d...
Downloading and preparing dataset text/default-b0932b2bdbb63283 (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/
447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d...
Exception in device=TPU:6: [Errno 17] File exists: '/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d.incomplete'
Exception in device=TPU:4: [Errno 17] File exists: '/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d.incomplete'
Exception in device=TPU:1: [Errno 17] File exists: '/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d.incomplete'
Downloading and preparing dataset text/default-b0932b2bdbb63283 (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/
447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d...
Exception in device=TPU:7: [Errno 17] File exists: '/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d.incomplete'
Exception in device=TPU:3: [Errno 17] File exists: '/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d.incomplete'
Downloading and preparing dataset text/default-b0932b2bdbb63283 (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/
447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d...
Exception in device=TPU:2: [Errno 17] File exists: '/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d.incomplete'
Exception in device=TPU:0: [Errno 17] File exists: '/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d.incomplete'
Traceback (most recent call last):
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 231, in _start_fn
fn(gindex, *args)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 231, in _start_fn
fn(gindex, *args)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 231, in _start_fn
fn(gindex, *args)
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 300, in _mp_fn
main()
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 300, in _mp_fn
main()
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 300, in _mp_fn
main()
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 240, in main
train_dataset = get_dataset(data_args, tokenizer=tokenizer) if training_args.do_train else None
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 240, in main
train_dataset = get_dataset(data_args, tokenizer=tokenizer) if training_args.do_train else None
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 240, in main
train_dataset = get_dataset(data_args, tokenizer=tokenizer) if training_args.do_train else None
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 134, in get_dataset
dataset = load_dataset("text", data_files=file_path, split="train")
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/load.py", line 546, in load_dataset
download_config=download_config, download_mode=download_mode, ignore_verifications=ignore_verifications,
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 134, in get_dataset
dataset = load_dataset("text", data_files=file_path, split="train")
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 134, in get_dataset
dataset = load_dataset("text", data_files=file_path, split="train")
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/builder.py", line 450, in download_and_prepare
with incomplete_dir(self._cache_dir) as tmp_data_dir:
Traceback (most recent call last):
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/load.py", line 546, in load_dataset
download_config=download_config, download_mode=download_mode, ignore_verifications=ignore_verifications,
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/load.py", line 546, in load_dataset
download_config=download_config, download_mode=download_mode, ignore_verifications=ignore_verifications,
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 231, in _start_fn
fn(gindex, *args)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/builder.py", line 450, in download_and_prepare
with incomplete_dir(self._cache_dir) as tmp_data_dir:
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/builder.py", line 422, in incomplete_dir
os.makedirs(tmp_dir)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/builder.py", line 450, in download_and_prepare
with incomplete_dir(self._cache_dir) as tmp_data_dir:
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 300, in _mp_fn
main()
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 240, in main
train_dataset = get_dataset(data_args, tokenizer=tokenizer) if training_args.do_train else None
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/builder.py", line 422, in incomplete_dir
os.makedirs(tmp_dir)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 231, in _start_fn
fn(gindex, *args)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/builder.py", line 422, in incomplete_dir
os.makedirs(tmp_dir)
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 134, in get_dataset
dataset = load_dataset("text", data_files=file_path, split="train")
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/load.py", line 546, in load_dataset
download_config=download_config, download_mode=download_mode, ignore_verifications=ignore_verifications,
FileExistsError: [Errno 17] File exists: '/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d.incomplete'
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 300, in _mp_fn
main()
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/builder.py", line 450, in download_and_prepare
with incomplete_dir(self._cache_dir) as tmp_data_dir:
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: '/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d.incomplete'
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 240, in main
train_dataset = get_dataset(data_args, tokenizer=tokenizer) if training_args.do_train else None
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
FileExistsError: [Errno 17] File exists: '/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d.incomplete'
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 134, in get_dataset
dataset = load_dataset("text", data_files=file_path, split="train")
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/builder.py", line 422, in incomplete_dir
os.makedirs(tmp_dir)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/load.py", line 546, in load_dataset
download_config=download_config, download_mode=download_mode, ignore_verifications=ignore_verifications,
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/builder.py", line 450, in download_and_prepare
with incomplete_dir(self._cache_dir) as tmp_data_dir:
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
FileExistsError: [Errno 17] File exists: '/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d.incomplete'
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/builder.py", line 422, in incomplete_dir
os.makedirs(tmp_dir)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
Traceback (most recent call last):
FileExistsError: [Errno 17] File exists: '/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d.incomplete'
Traceback (most recent call last):
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 231, in _start_fn
fn(gindex, *args)
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 300, in _mp_fn
main()
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 240, in main
train_dataset = get_dataset(data_args, tokenizer=tokenizer) if training_args.do_train else None
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 134, in get_dataset
dataset = load_dataset("text", data_files=file_path, split="train")
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 231, in _start_fn
fn(gindex, *args)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/load.py", line 546, in load_dataset
download_config=download_config, download_mode=download_mode, ignore_verifications=ignore_verifications,
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/builder.py", line 450, in download_and_prepare
with incomplete_dir(self._cache_dir) as tmp_data_dir:
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 300, in _mp_fn
main()
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 240, in main
train_dataset = get_dataset(data_args, tokenizer=tokenizer) if training_args.do_train else None
File "/home/*****/huggingface_roberta/run_language_modeling.py", line 134, in get_dataset
dataset = load_dataset("text", data_files=file_path, split="train")
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/load.py", line 546, in load_dataset
download_config=download_config, download_mode=download_mode, ignore_verifications=ignore_verifications,
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/builder.py", line 450, in download_and_prepare
with incomplete_dir(self._cache_dir) as tmp_data_dir:
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/site-packages/nlp/builder.py", line 422, in incomplete_dir
os.makedirs(tmp_dir)
File "/anaconda3/envs/torch-xla-1.6/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: '/home/*****/.cache/huggingface/datasets/text/default-b0932b2bdbb63283/0.0.0/447f2bcfa2a721a37bc8fdf23800eade1523cf07f7eada6fe661fe4d070d380d.incomplete'
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 1
- Comments: 21 (9 by maintainers)
I can see that the tokenizer in
run_language_modeling.py
is not instantiated the same way as in your separated script. Indeed we can see L196:Could you try to make it so they are instantiated the exact same way please ?
Could you try to run
dataset = load_dataset("text", data_files=file_path, split="train")
once before calling the script ?It looks like several processes try to create the dataset in arrow format at the same time. If the dataset is already created it should be fine
Could you also check that the
args.block_size
used in the lambda function is the same as well ?Yea it could definitely explain why you have two different cache files. Let me know if using the same tokenizers on both sides fixes the issue