How to do a partial expand in Snakemake?

2019-02-19 02:47发布

问题:

I'm trying to first generate 4 files, for the LETTERS x NUMS combinations, then summarize over the NUMS to obtain one file per element in LETTERS:

LETTERS = ["A", "B"]
NUMS = ["1", "2"]


rule all:
    input:
        expand("combined_{letter}.txt", letter=LETTERS)

rule generate_text:
    output:
        "text_{letter}_{num}.txt"
    shell:
        """
        echo "test" > {output}
        """

rule combine text:
    input:
        expand("text_{letter}_{num}.txt", num=NUMS)
    output:
        "combined_{letter}.txt"
    shell:
        """
        cat {input} > {output}
        """

Executing this snakefile results in the following error:

WildcardError in line 19 of /tmp/Snakefile:
No values given for wildcard 'letter'.
  File "/tmp/Snakefile", line 19, in <module>

It seems that partial expand is not possible. Is it a limitation of expand ? If so, how should I circumvent it ?

回答1:

It seems that this is not a limitation of expand, but a limitation of my familiarity with the way string-formatting works in python. I need to use double brackets for the non-expanded wildcard:

LETTERS = ["A", "B"]
NUMS = ["1", "2"]


rule all:
    input:
        expand("combined_{letter}.txt", letter=LETTERS)

rule generate_text:
    output:
        "text_{letter}_{num}.txt"
    shell:
        """
        echo "test" > {output}
        """

rule combine text:
    input:
        expand("text_{{letter}}_{num}.txt", num=NUMS)
    output:
        "combined_{letter}.txt"
    shell:
        """
        cat {input} > {output}
        """

Executing this snakefile now generates the expected following files:

text_A_2.txt
text_A_1.txt
text_B_2.txt
text_B_1.txt
combined_A.txt
combined_B.txt


回答2:

Indeed, braces need to be escaped when you want to ignore them in expand. It relies on str.format, and hence any rules from format apply to expand as well.