snakemake: MissingOutputException with directory and GS remote provider
I am trying to run a workflow rule that creates a directory in GS, but Snakemake continually fails to recognize that the directory exists. The error message recommends using the directory() flag, which I am.
This appears to be related to https://github.com/snakemake/snakemake/issues/396.
Snakemake version
5.22.1
Describe the bug
Output directories are flagged as missing when using GS remote provider.
Logs
The most salient part:
Uploading to remote: rs-ukb/logs/bgen_to_zarr.XY.txt
Finished upload.
ImproperOutputException in line 57 of /workdir/Snakefile:
Outputs of incorrect type (directories when expecting files or vice versa). Output directories must be flagged with directory(). for rule bgen_to_zarr:
rs-ukb/prep-data/gt-imputation/ukb_chrXY.zarr
File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 544, in handle_job_success
File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 225, in handle_job_success
Full log: error_log.txt
Minimal example
Here is the offending rule, and I apologize that this isn’t fully reproducible but it’s difficult to share some of the details:
def bgen_samples_path(wc):
n_samples = bgen_contigs.loc[wc.bgen_contig]['n_consent_samples']
return [f"raw-data/gt-imputation/ukb59384_imp_chr{wc.bgen_contig}_v3_s{n_samples}.sample"]
rule bgen_to_zarr:
input:
bgen_path="raw-data/gt-imputation/ukb_imp_chr{bgen_contig}_v3.bgen",
variants_path="raw-data/gt-imputation/ukb_mfi_chr{bgen_contig}_v3.txt",
samples_path=bgen_samples_path
output:
directory("prep-data/gt-imputation/ukb_chr{bgen_contig}.zarr")
params:
contig_index=lambda wc: bgen_contigs.loc[str(wc.bgen_contig)]['index']
conda:
"envs/gwas.yaml"
log:
"logs/bgen_to_zarr.{bgen_contig}.txt"
shell:
# This will write to the local {output} path
"python scripts/convert.py bgen_to_zarr "
"--input-path-bgen={input.bgen_path} "
"--input-path-variants={input.variants_path} "
"--input-path-samples={input.samples_path} "
"--output-path={output} "
"--contig-name={wildcards.bgen_contig} "
"--contig-index={params.contig_index} "
"--remote=False 2> {log} "
Invocation:
snakemake --use-conda --cores 1 \
--default-remote-provider GS --default-remote-prefix $GS_BUCKET \
$GS_BUCKET/prep-data/gt-imputation/ukb_chrXY.zarr
I also get the same error when running on a cluster, i.e. using:
snakemake --use-conda --kubernetes \
--default-remote-provider GS --default-remote-prefix $GS_BUCKET \
$GS_BUCKET/prep-data/gt-imputation/ukb_chrXY.zarr
Additional context
I am able to work around this by using an individual checkpoint/sentinel file of some kind, but it’s unclear to me if directories are even supported for Google Storage. Is that in the docs somewhere? Am I just trying to use some feature that doesn’t exist?
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 3
- Comments: 38 (35 by maintainers)
Makes sense 👍 Thanks!
If someone else doesn’t get to it first, I will try at my next available opportunity.