snakemake: Python 3.12 break f string inside Snakemake

Snakemake version Tested with snakemake 7.34.2 (older version are affected, too. Tested down to 7.18.2)

Affected Python version: 3.12.0 (Last known good: Python 3.11.6)

Describe the bug

When snakemake is installed with python version 3.12.0, f-strings get unexpected extra space.

For example:

output: f'{PREFIX}.txt'

will produce…

Logs

…this output in the log (note the unexpected extra spaces):

Building DAG of jobs...
File path ' SID23454678 .txt ' starts with whitespace. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.
File path ' SID23454678 .txt ' ends with whitespace. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.

When run on python 3.12.0

Minimal example

UnitTest.smk:

#!/usr/bin/env python3
PREFIX = 'SID23454678'
rule unit1:
 input:
 output:
  f'{PREFIX}.txt'
 run:     
   "sleep 1"

Running with:

snakemake --dry-run -s UnitTest.smk --cores 1

In this environment… :

mamba create -n snake-test -c conda-forge -c bioconda snakemake-minimal=7.32.4 python=3.11

…everything work as expected:

[Thu Oct 12 15:55:02 2023]
rule unit1:
    output: SID23454678.txt
    jobid: 0
    reason: Missing output files: SID23454678.txt
    resources: tmpdir=/tmp

But in that environment…:

mamba create -n snake-test -c conda-forge -c bioconda snakemake-minimal=7.32.4 python=3.12

…the bug is triggered:

Building DAG of jobs...
File path ' SID23454678 .txt ' starts with whitespace. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.
File path ' SID23454678 .txt ' ends with whitespace. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.

Additional context

Apparently, the f-string have been completely overhauled as part of python 3.12. I suspect the new f-string parser somehow clashes with the parse of snakemake files.

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Reactions: 8
  • Comments: 30 (26 by maintainers)

Commits related to this issue

Most upvoted comments

The bioconda package has been updated to require python<3.12 until we have fixed this.

I do think so. Before 3.12, tokenize.generate_tokens will yield the whole f-string like f"{PREFIX}.txt". Since 3.12, tokenize.generate_tokens will yield the whole f-string like ‘f"’, ‘{’, ‘PREFIX’, ‘}’, ‘.txt’. ‘"’

This is reporte in pep-0701

Running python 3.12.0 and snakemake 7.32.4, I am seeing the extra spaces in f-strings.

rule:
    input: "{option}.in"
    output: "{option}.out"
    params: temp = lambda wildcards, output: f"{output}.temp"
    run: 
       print(params.temp)
       print(f"before_{wildcards.option}_after")

Printed results are:

 test.out .temp 
 before_ test _after 

Thanks for the confirmation 😃. Is this already tracked somewhere, or should I open a new issue to track this?

No, I’m just noticed from your comment, and a more robust fixation for f-string should be make.

The original problem should be fixed, by the pull request that the closing message of this issue cites: https://github.com/snakemake/snakemake/issues/2480#event-11028552730

So, starting with snakemake 8.0.0, the underlying incompatibility with python 3.12 is fixed, and no workaround should be necessary.

Same for snakemake version 7.32.4 (from build number 1 onwards). Here the bioconda recipe has been adjusted to never use python 3.12: https://github.com/bioconda/bioconda-recipes/pull/43716/files

For snakemake version before that, we would have to patch the repodata. Not sure if anybody has done this.

So if I’m not mistaken, the line 1251 should be changed to recognize these new tokens from python 3.12, and not insert spaces around F_STRING_BEGIN, F_STRING_MIDDLE and F_STRING_END tokens?

further test on a simplifier snakefile:

#!/usr/bin/env python3
PREFIX = 'SID23454678'
if 1:
  f'{PREFIX}.txt'
from snakemake import parser
import tokenize


class Snakefile:
        def __init__(self, snakefile, rulecount=0):
            self.path = snakefile
            self.file = open(self.path)
            self.tokens = tokenize.generate_tokens(self.file.readline)
            self.rulecount = rulecount
            self.lines = 0

        def __next__(self):
            return next(self.tokens)

        def __iter__(self):
            return self

        def __enter__(self):
            return self

        def __exit__(self, *args):
            self.file.close()

with Snakefile("snakefile", rulecount=0) as snakefile:
        automaton = parser.Python(snakefile)
        linemap = dict()
        compilation = list()
        for t, orig_token in automaton.consume():
            print(t, orig_token)
            l = parser.lineno(orig_token)
            linemap.update(
                dict(
                    (i, l)
                    for i in range(
                        snakefile.lines + 1, snakefile.lines + t.count("\n") + 1
                    )
                )
            )
            snakefile.lines += t.count("\n")
            compilation.append(t)
    compilation_ = "".join(parser.format_tokens(compilation))
    if linemap:
        last = max(linemap)
        linemap[last + 1] = linemap[last]
    print(compilation_, linemap, snakefile.rulecount)

and the key proble happened here:

f' TokenInfo(type=61 (FSTRING_START), string="f'", start=(4, 2), end=(4, 4), line="  f'{PREFIX}.txt'\n")
{ TokenInfo(type=55 (OP), string='{', start=(4, 4), end=(4, 5), line="  f'{PREFIX}.txt'\n")
PREFIX TokenInfo(type=1 (NAME), string='PREFIX', start=(4, 5), end=(4, 11), line="  f'{PREFIX}.txt'\n")
} TokenInfo(type=55 (OP), string='}', start=(4, 11), end=(4, 12), line="  f'{PREFIX}.txt'\n")
.txt TokenInfo(type=62 (FSTRING_MIDDLE), string='.txt', start=(4, 12), end=(4, 16), line="  f'{PREFIX}.txt'\n")
' TokenInfo(type=63 (FSTRING_END), string="'", start=(4, 16), end=(4, 17), line="  f'{PREFIX}.txt'\n")

I test with param “–print-compilation” and that is output:

#!/usr/bin/env python3
PREFIX = 'SID23454678'
@workflow.rule(name='unit1', lineno=3, snakefile='/home/hwrn/Templates/test.smk')

@workflow.input(

)

@workflow.output(
        f' { PREFIX } .txt '

)
@workflow.run
def __rule_unit1(input, output, params, wildcards, threads, resources, log, version, rule, conda_env, container_img, singularity_args, use_singularity, env_modules, bench_record, jobid, is_shell, bench_iteration, cleanup_scripts, shadow_dir, edit_notebook, conda_base_path, basedir, runtime_sourcecache_path, __is_snakemake_rule_func=True):
                "sleep 1"

It is not caused by f-string but snakemake compilation

No it doesn’t.

Haha, you were faster than my screenshot upload. 😃