snakemake: Google Storage download retry predicate additional exception
The current code is looking for snakemake.exceptions.CheckSumMismatchException when deciding whether or not to retry a download: https://github.com/snakemake/snakemake/blob/223bcc52d058e9704e69dac65c101ea1b18f3361/snakemake/remote/GS.py#L42-L48
However, I am seeing a suspiciously large number of failures like this in my pipeline:
Traceback (most recent call last):
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/__init__.py", line 687, in snakemake
success = workflow.execute(
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/workflow.py", line 1005, in execute
success = scheduler.schedule()
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 489, in schedule
self.run(runjobs)
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 500, in run
executor.run_jobs(
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 131, in run_jobs
self.run(
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 447, in run
future = self.run_single_job(job)
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 491, in run_single_job
self.cached_or_run, job, run_wrapper, *self.job_args_and_prepare(job)
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 452, in job_args_and_prepare
job.prepare()
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/jobs.py", line 710, in prepare
self.download_remote_input()
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/jobs.py", line 682, in download_remote_input
f.download_from_remote()
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/io.py", line 584, in download_from_remote
self.remote_object.download()
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func
return retry_target(
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/remote/GS.py", line 226, in download
return download_blob(self.blob, self.local_file())
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func
return retry_target(
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/remote/GS.py", line 69, in download_blob
blob.download_to_file(parser)
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 1041, in download_to_file
self._do_download(
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 900, in _do_download
response = download.consume(transport, timeout=timeout)
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/resumable_media/requests/download.py", line 171, in consume
self._write_to_stream(result)
File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/resumable_media/requests/download.py", line 120, in _write_to_stream
raise common.DataCorruption(response, msg)
google.resumable_media.common.DataCorruption: Checksum mismatch while downloading:
https://storage.googleapis.com/download/storage/v1/b/rs-ukb/o/raw%2Fgt-imputation%2Fukb_imp_chr9_v3.bgen?generation=1602861266282729&alt=media
The X-Goog-Hash header indicated an MD5 checksum of:
J3RmHIDzGmBKkklx/ImWtg==
but the actual MD5 checksum of the downloaded contents was:
XHqx7gai/Eij53rF8bMrKg==
It looks like this google.resumable_media.common.DataCorruption is not being wrapped as a snakemake.exceptions.CheckSumMismatchException or some other design flaw exists that keeps these requests from being retried.
Note: if_transient_error does not appear to apply to this error either
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 19 (15 by maintainers)
I’ll follow up on this
Hm, here are a couple large files accessible without requester pays:
The size of the files would steadily decrease as you change the N in “chrN” to anything between 1 and 22.
I can’t share the exact files I was using unfortunately but these are very similar.