lm-evaluation-harness: pubmedqa task data fails to download
using lm-eval==0.2.0:
python ./tasks/eval_harness/download.py --task_list pubmedqa
Downloading and preparing dataset pubmed_qa/pqa_labeled (download: 656.02 MiB, generated: 1.99 MiB, post-processed: Unknown size, total: 658.01 MiB) to /gpfswork/rech/six/commun/datasets/pubmed_qa/pqa_labeled/1.0.0/2e65addecca4197502cd10ab8ef1919a47c28672f62d7abac7cc9afdcf24fb2d...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 4495.50it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 671.73it/s]
Traceback (most recent call last):
File "./tasks/eval_harness/download.py", line 20, in <module>
main()
File "./tasks/eval_harness/download.py", line 17, in main
tasks.get_task_dict(task_list)
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/__init__.py", line 325, in get_task_dict
task_name_dict = {
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/__init__.py", line 326, in <dictcomp>
task_name: get_task(task_name)()
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/common.py", line 11, in __init__
super().__init__()
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/base.py", line 350, in __init__
self.download()
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/common.py", line 14, in download
self.data = datasets.load_dataset(path=self.DATASET_PATH, name=self.DATASET_NAME)
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/load.py", line 1632, in load_dataset
builder_instance.download_and_prepare(
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/builder.py", line 607, in download_and_prepare
self._download_and_prepare(
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/builder.py", line 679, in _download_and_prepare
verify_checksums(
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/utils/info_utils.py", line 40, in verify_checksums
raise NonMatchingChecksumError(error_msg + str(bad_urls))
datasets.utils.info_utils.NonMatchingChecksumError: Checksums didn't match for dataset source files:
['https://drive.google.com/uc?export=download&id=1RsGLINVce-0GsDkCLDuLZmoLuzfmoCuQ', 'https://drive.google.com/uc?export=download&id=15v1x6aQDlZymaHGP7cZJZZYFfeJt2NdS']
probably those files got updated and require new checksums perhaps?
I checked the files are there.
Thanks.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21 (10 by maintainers)
so one can download these manually via
gdownor browser, but not via datasets - apparently due to reaching some traffic limit.we will have to move the data to a different storage that doesn’t have such caps.
In any case the issue is not of
lm-eval.I will close it once we get that fixed on the hub.
Glad to hear it worked out! Thanks for reporting everything 😃