snakemake: Python 3.12 break f string inside Snakemake

Snakemake version Tested with snakemake 7.34.2 (older version are affected, too. Tested down to 7.18.2)

Affected Python version: 3.12.0 (Last known good: Python 3.11.6)

Describe the bug

When snakemake is installed with python version 3.12.0, f-strings get unexpected extra space.

For example:

output: f'{PREFIX}.txt'

will produce…

Logs

…this output in the log (note the unexpected extra spaces):

Building DAG of jobs...
File path ' SID23454678 .txt ' starts with whitespace. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.
File path ' SID23454678 .txt ' ends with whitespace. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.

When run on python 3.12.0

Minimal example

UnitTest.smk:

#!/usr/bin/env python3
PREFIX = 'SID23454678'
rule unit1:
 input:
 output:
  f'{PREFIX}.txt'
 run:     
   "sleep 1"

Running with:

snakemake --dry-run -s UnitTest.smk --cores 1

In this environment… :

mamba create -n snake-test -c conda-forge -c bioconda snakemake-minimal=7.32.4 python=3.11

…everything work as expected:

[Thu Oct 12 15:55:02 2023]
rule unit1:
    output: SID23454678.txt
    jobid: 0
    reason: Missing output files: SID23454678.txt
    resources: tmpdir=/tmp

But in that environment…:

mamba create -n snake-test -c conda-forge -c bioconda snakemake-minimal=7.32.4 python=3.12

…the bug is triggered:

Building DAG of jobs...
File path ' SID23454678 .txt ' starts with whitespace. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.
File path ' SID23454678 .txt ' ends with whitespace. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.

Additional context

Apparently, the f-string have been completely overhauled as part of python 3.12. I suspect the new f-string parser somehow clashes with the parse of snakemake files.

About this issue

Original URL
State: closed
Created 9 months ago
Reactions: 8
Comments: 30 (26 by maintainers)

Commits related to this issue

Pin Python version - Use Python<=3.12 for now, see snakemake/snakemake#2480 for discussion — committed to vinisalazar/bioconda-recipes by vinisalazar 8 months ago
Update Metaphor dependencies (#44279) * Pin Python version - Use Python<=3.12 for now, see snakemake/snakemake#2480 for discussion * Bump build number — committed to bioconda/bioconda-recipes by vinisalazar 8 months ago
fix: handle different f-string tokens in py3.12 (#2485) ### Description fix #2480 ### QC In this fix, f-string such as `f"{any}"` will be formated as `f"{ any }"`. However, f-string suc... — committed to snakemake/snakemake by Hocnonsense 7 months ago
Restrict Python version until Snakemake bug fix released See https://github.com/snakemake/snakemake/issues/2480 — committed to twrightsman/quantify-RNA-pipeline by twrightsman 7 months ago

Most upvoted comments

The bioconda package has been updated to require python<3.12 until we have fixed this.

johanneskoester on Oct 18, 2023

I do think so. Before 3.12, tokenize.generate_tokens will yield the whole f-string like f"{PREFIX}.txt". Since 3.12, tokenize.generate_tokens will yield the whole f-string like ‘f"’, ‘{’, ‘PREFIX’, ‘}’, ‘.txt’. ‘"’

This is reporte in pep-0701

Hocnonsense on Oct 13, 2023

Running python 3.12.0 and snakemake 7.32.4, I am seeing the extra spaces in f-strings.

rule:
    input: "{option}.in"
    output: "{option}.out"
    params: temp = lambda wildcards, output: f"{output}.temp"
    run: 
       print(params.temp)
       print(f"before_{wildcards.option}_after")

Printed results are:

 test.out .temp 
 before_ test _after

JosephCottam on Dec 6, 2023

Thanks for the confirmation 😃. Is this already tracked somewhere, or should I open a new issue to track this?

No, I’m just noticed from your comment, and a more robust fixation for f-string should be make.

Hocnonsense on Jan 25, 2024

The original problem should be fixed, by the pull request that the closing message of this issue cites: https://github.com/snakemake/snakemake/issues/2480#event-11028552730

So, starting with snakemake 8.0.0, the underlying incompatibility with python 3.12 is fixed, and no workaround should be necessary.

Same for snakemake version 7.32.4 (from build number 1 onwards). Here the bioconda recipe has been adjusted to never use python 3.12: https://github.com/bioconda/bioconda-recipes/pull/43716/files

For snakemake version before that, we would have to patch the repodata. Not sure if anybody has done this.

dlaehnemann on Jan 9, 2024

So if I’m not mistaken, the line 1251 should be changed to recognize these new tokens from python 3.12, and not insert spaces around F_STRING_BEGIN, F_STRING_MIDDLE and F_STRING_END tokens?

DrYak on Oct 15, 2023

So the key question happened here: https://github.com/snakemake/snakemake/blob/8332d2d28af0900724f7a56aa0e394a99f48d9e0/snakemake/parser.py#L1248-L1254

Hocnonsense on Oct 13, 2023

further test on a simplifier snakefile:

#!/usr/bin/env python3
PREFIX = 'SID23454678'
if 1:
  f'{PREFIX}.txt'

from snakemake import parser
import tokenize


class Snakefile:
        def __init__(self, snakefile, rulecount=0):
            self.path = snakefile
            self.file = open(self.path)
            self.tokens = tokenize.generate_tokens(self.file.readline)
            self.rulecount = rulecount
            self.lines = 0

        def __next__(self):
            return next(self.tokens)

        def __iter__(self):
            return self

        def __enter__(self):
            return self

        def __exit__(self, *args):
            self.file.close()

with Snakefile("snakefile", rulecount=0) as snakefile:
        automaton = parser.Python(snakefile)
        linemap = dict()
        compilation = list()
        for t, orig_token in automaton.consume():
            print(t, orig_token)
            l = parser.lineno(orig_token)
            linemap.update(
                dict(
                    (i, l)
                    for i in range(
                        snakefile.lines + 1, snakefile.lines + t.count("\n") + 1
                    )
                )
            )
            snakefile.lines += t.count("\n")
            compilation.append(t)
    compilation_ = "".join(parser.format_tokens(compilation))
    if linemap:
        last = max(linemap)
        linemap[last + 1] = linemap[last]
    print(compilation_, linemap, snakefile.rulecount)

and the key proble happened here:

f' TokenInfo(type=61 (FSTRING_START), string="f'", start=(4, 2), end=(4, 4), line="  f'{PREFIX}.txt'\n")
{ TokenInfo(type=55 (OP), string='{', start=(4, 4), end=(4, 5), line="  f'{PREFIX}.txt'\n")
PREFIX TokenInfo(type=1 (NAME), string='PREFIX', start=(4, 5), end=(4, 11), line="  f'{PREFIX}.txt'\n")
} TokenInfo(type=55 (OP), string='}', start=(4, 11), end=(4, 12), line="  f'{PREFIX}.txt'\n")
.txt TokenInfo(type=62 (FSTRING_MIDDLE), string='.txt', start=(4, 12), end=(4, 16), line="  f'{PREFIX}.txt'\n")
' TokenInfo(type=63 (FSTRING_END), string="'", start=(4, 16), end=(4, 17), line="  f'{PREFIX}.txt'\n")

Hocnonsense on Oct 13, 2023

I test with param “–print-compilation” and that is output:

#!/usr/bin/env python3
PREFIX = 'SID23454678'
@workflow.rule(name='unit1', lineno=3, snakefile='/home/hwrn/Templates/test.smk')

@workflow.input(

)

@workflow.output(
        f' { PREFIX } .txt '

)
@workflow.run
def __rule_unit1(input, output, params, wildcards, threads, resources, log, version, rule, conda_env, container_img, singularity_args, use_singularity, env_modules, bench_record, jobid, is_shell, bench_iteration, cleanup_scripts, shadow_dir, edit_notebook, conda_base_path, basedir, runtime_sourcecache_path, __is_snakemake_rule_func=True):
                "sleep 1"

It is not caused by f-string but snakemake compilation

Hocnonsense on Oct 13, 2023

No it doesn’t.

Haha, you were faster than my screenshot upload. 😃

DrYak on Oct 13, 2023