snakemake: default remote prefix duplicated for checkpoint output on cloud

Snakemake version 5.22.1

Describe the bug Using the --default-remote-prefix parameter will cause MissingInputException errors to arise for the outputs of checkpoint rules.

Steps to Reproduce Follow the Google Life Sciences Executor Tutorial but convert the bwa_map rule into a checkpoint like this

diff --git Snakefile Snakefile
index 68974d1..d519d4f 100644
--- Snakefile
+++ Snakefile
@@ -4,7 +4,7 @@ rule all:
     input:
         "plots/quals.svg"

-rule bwa_map:
+checkpoint bwa_map:
     input:
         fastq="samples/{sample}.fastq",
         idx=multiext("genome.fa", ".amb", ".ann", ".bwt", ".pac", ".sa")
@@ -19,7 +19,7 @@ rule bwa_map:

 rule samtools_sort:
     input:
-        "mapped_reads/{sample}.bam"
+        lambda wildcards: checkpoints.bwa_map.get(sample=wildcards.sample).output[0]
     output:
         "sorted_reads/{sample}.bam"
     conda:

and then you should see the following when running the pipeline:

Building DAG of jobs...
MissingInputException in line 20 of Snakefile:
Missing input files for rule samtools_sort:
snakemake-testing-data/snakemake-testing-data/mapped_reads/A.bam

Notice how snakemake-testing-data appears prepended twice?

Bug Hypothesis I think rules.apply_default_remote() is being applied to the output more than once. It might help to check whether incomplete is true on this line before executing rules.apply_default_remote()?

It might be possible that this is a bug that extends beyond the life sciences executor (to other cloud environments), but I haven’t tested that yet.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 24 (19 by maintainers)

Commits related to this issue

Most upvoted comments

Ok, I may have found somewhat of a fix? I just deleted lines 722 and 723 out of rules.py, and everything just started working again.

I don’t really understand the underlying reason, but I’m guessing something about checkpoints causes apply_default_remote() in rules.py to be run more times than usual for a rule. So my approach was to try to figure out where that code is. And lines 722 and 723 seem like good candidates.

I’m planning on spending some more time on it tomorrow. Once I feel like I understand things better, I can try to explain here and maybe write up a PR!

@aryarm Okay, I see - thanks for the update. Unfortunately I’m not very familiar with the Snakemake code, but I’ve taken a quick look and it seems the logic for applying the default remote prefix has moved to here https://github.com/snakemake/snakemake/blob/01d6102795c96ce695d6d7201f7e4655a1d5cac8/snakemake/path_modifier.py#L14 I’m not sure how much I can help with this, but I’ll take a deeper look sometime this week as I’m also trying to figure out what is causing #1260.

@aryarm, I tested 5.22.1 with your solution and also the 6.0.5 version (latest). I receive the following error with your fix:

snakemake/executors/init.py", line 1827, in handle_remote if isinstance(target, _IOFile) and target.remote_object.provider.is_default: AttributeError: ‘NoneType’ object has no attribute ‘provider’

I agree! @johanneskoester was on fire this morning 😃 🔥

Thanks @aryarm I really appreciate that! I’m hoping we will get our testing running again soon, and I’ll take a look at the commit to see if it can help the current test.