snakemake: MissingOutputException with directory and GS remote provider

I am trying to run a workflow rule that creates a directory in GS, but Snakemake continually fails to recognize that the directory exists. The error message recommends using the directory() flag, which I am.

This appears to be related to https://github.com/snakemake/snakemake/issues/396.

Snakemake version

5.22.1

Describe the bug

Output directories are flagged as missing when using GS remote provider.

Logs

The most salient part:

Uploading to remote: rs-ukb/logs/bgen_to_zarr.XY.txt
Finished upload.
ImproperOutputException in line 57 of /workdir/Snakefile:
Outputs of incorrect type (directories when expecting files or vice versa). Output directories must be flagged with directory(). for rule bgen_to_zarr:
rs-ukb/prep-data/gt-imputation/ukb_chrXY.zarr
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 544, in handle_job_success
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 225, in handle_job_success

Full log: error_log.txt

Minimal example

Here is the offending rule, and I apologize that this isn’t fully reproducible but it’s difficult to share some of the details:

def bgen_samples_path(wc):
    n_samples = bgen_contigs.loc[wc.bgen_contig]['n_consent_samples']
    return [f"raw-data/gt-imputation/ukb59384_imp_chr{wc.bgen_contig}_v3_s{n_samples}.sample"]

rule bgen_to_zarr:
    input:
        bgen_path="raw-data/gt-imputation/ukb_imp_chr{bgen_contig}_v3.bgen",
        variants_path="raw-data/gt-imputation/ukb_mfi_chr{bgen_contig}_v3.txt",
        samples_path=bgen_samples_path
    output:
        directory("prep-data/gt-imputation/ukb_chr{bgen_contig}.zarr")
    params:
        contig_index=lambda wc: bgen_contigs.loc[str(wc.bgen_contig)]['index']
    conda:
        "envs/gwas.yaml"
    log:
        "logs/bgen_to_zarr.{bgen_contig}.txt"
    shell:
        # This will write to the local {output} path
        "python scripts/convert.py bgen_to_zarr "
        "--input-path-bgen={input.bgen_path} "
        "--input-path-variants={input.variants_path} "
        "--input-path-samples={input.samples_path} "
        "--output-path={output} "
        "--contig-name={wildcards.bgen_contig} "
        "--contig-index={params.contig_index} "
        "--remote=False 2> {log} "

Invocation:

snakemake --use-conda --cores 1 \
--default-remote-provider GS --default-remote-prefix $GS_BUCKET \
$GS_BUCKET/prep-data/gt-imputation/ukb_chrXY.zarr

I also get the same error when running on a cluster, i.e. using:

snakemake --use-conda --kubernetes \
--default-remote-provider GS --default-remote-prefix $GS_BUCKET \
$GS_BUCKET/prep-data/gt-imputation/ukb_chrXY.zarr

Additional context

I am able to work around this by using an individual checkpoint/sentinel file of some kind, but it’s unclear to me if directories are even supported for Google Storage. Is that in the docs somewhere? Am I just trying to use some feature that doesn’t exist?

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 3
  • Comments: 38 (35 by maintainers)

Most upvoted comments

Makes sense 👍 Thanks!

If someone else doesn’t get to it first, I will try at my next available opportunity.