datasets: coqa NonMatchingChecksumError

>>> from datasets import load_dataset
>>> dataset = load_dataset("coqa")
Downloading: 3.82kB [00:00, 1.26MB/s]                                                                                                                                                                                                                       
Downloading: 1.79kB [00:00, 733kB/s]                                                                                                                                                                                                                        
Using custom data configuration default
Downloading and preparing dataset coqa/default (download: 55.40 MiB, generated: 18.35 MiB, post-processed: Unknown size, total: 73.75 MiB) to /Users/zhaofengw/.cache/huggingface/datasets/coqa/default/1.0.0/553ce70bfdcd15ff4b5f4abc4fc2f37137139cde1f58f4f60384a53a327716f0...
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 222/222 [00:00<00:00, 1.38MB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 222/222 [00:00<00:00, 1.32MB/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.91it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1117.44it/s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/zhaofengw/miniconda3/lib/python3.9/site-packages/datasets/load.py", line 1632, in load_dataset
    builder_instance.download_and_prepare(
  File "/Users/zhaofengw/miniconda3/lib/python3.9/site-packages/datasets/builder.py", line 607, in download_and_prepare
    self._download_and_prepare(
  File "/Users/zhaofengw/miniconda3/lib/python3.9/site-packages/datasets/builder.py", line 679, in _download_and_prepare
    verify_checksums(
  File "/Users/zhaofengw/miniconda3/lib/python3.9/site-packages/datasets/utils/info_utils.py", line 40, in verify_checksums
    raise NonMatchingChecksumError(error_msg + str(bad_urls))
datasets.utils.info_utils.NonMatchingChecksumError: Checksums didn't match for dataset source files:
['https://nlp.stanford.edu/data/coqa/coqa-train-v1.0.json', 'https://nlp.stanford.edu/data/coqa/coqa-dev-v1.0.json']

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 18 (9 by maintainers)

Most upvoted comments

Maybe the file had issues during the download ? Could you try to delete your cache and try again ? By default the downloads cache is at ~/.cache/huggingface/datasets/downloads

Also can you check if you have a proxy that could prevent the download to succeed ? Are you able to download those files via your browser ?

I also can’t reproduce the error on Windows/Linux (tested both the master and the 1.15.1 version).