how to pass a function under snakemake run directi

2019-07-29 09:45发布

问题:

I am building a workflow in snakemake and would like to recycle one of the rules to two different input sources. The input sources could be either source1 or source1+source2 and depending on the input the output directory would also vary. Since this was quite complicated to do in the same rule and I didn't want to create the copy of the full rule I would like to create two rules with different input/output, but running same command.

Is it possible to make this work? I get the DAG resolved correctly but the job don't go through on the cluster (ERROR : bamcov_cmd not defined).. An example below (both rules use the same command at the end):

this is command

def bamcov_cmd():
return( (deepTools_path+"bamCoverage " +
            "-b {input.bam} " +
            "-o {output} " +
            "--binSize {params.bw_binsize} " +
            "-p {threads} " +
            "--normalizeTo1x {params.genome_size} " +
            "{params.read_extension} " +
            "&> {log}") )

this is the rule

rule bamCoverage:
input:
    bam = file1+"/{sample}.bam",
    bai = file1+"/{sample}.bam.bai"
output:
    "bamCoverage/{sample}.filter.bw"
params:
    bw_binsize = bw_binsize,
    genome_size = int(genome_size),
    read_extension = "--extendReads"
log:
    "bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
    "bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
    bamcov_cmd()

this is the optional rule2

rule bamCoverage2:
input:
    bam = file2+"/{sample}.filter.bam",
    bai = file2+"/{sample}.filter.bam.bai"
output:
    "bamCoverage/{sample}.filter.bw"
params:
    bw_binsize = bw_binsize,
    genome_size = int(genome_size),
    read_extension = "--extendReads"
log:
    "bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
    "bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
    bamcov_cmd()

回答1:

What you asked is possible in python. It depends if you have JUST python code in the file, or python and Snakemake. I will answer that first, and then I have a follow up response because I want you to set it up differently so you don't have to do it this way.

Just Python:

 from fileContainingMyBamCovCmdFunction import bamcov_cmd

 rule bamCoverage:
 ...
     run:
         bamcov_cmd()

Visually, see how I do it in this file, to reference access to buildHeader and buildSample. These files are being called by a Snakefile. It should work the same for you. https://github.com/LCR-BCCRC/workflow_exploration/blob/master/Snakemake/modules/py_buildFile/buildFile.py

EDIT 2017-07-23 - Updating code segment below to reflect user comment

Snakemake and Python:

 include: "fileContainingMyBamCovCmdFunction.suffix"

 rule bamCoverage:
 ...
     run:
         shell(bamcov_cmd())

EDIT END

If the function is truly specific to the bamCoverage call, if you prefer you can put it back in the rule. This implies it's not being called elsewhere, which may be true. Be careful when annotating files using '.' notation, I use '_' as I find it's easier to prevent creating cyclical dependencies this way. Also, if you do end up leaving the two rules separately, you will likely end up with ambiguity errors. http://snakemake.readthedocs.io/en/latest/snakefiles/rules.html?highlight=ruleorder#handling-ambiguous-rules When possible, it's best practice to have rules generating unique outputs.

As for alternatives, consider setting up the code like this?

from subprocess import call


rule all:
    input:
          "path/to/file/mySample.bw"
          #OR
          #"path/to/file/mySample_filtered.bw"


bamCoverage:
input:
    bam = file1+"/{sample}.bam",
    bai = file1+"/{sample}.bam.bai"
output:
    "bamCoverage/{sample}.bw"
params:
    bw_binsize = bw_binsize,
    genome_size = int(genome_size),
    read_extension = "--extendReads"
log:
    "bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
    "bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
    callString= deepTools_path + "bamCoverage " \
            + "-b " + wilcards.input.bam \
            + "-o " + wilcards.output \
            + "--binSize " str(params.bw_binsize) \
            + "-p " + str({threads}) \
            + "--normalizeTo1x " + str(params.genome_size) \
            + " " + str(params.read_extension) \
            + "&> " + str(log)
    call(callString, shell=True)


rule filterBam:
input:
    "{pathFB}/{sample}.bam"
output:
    "{pathFB}/{sample}_filtered.bam"
run:
    callString="samtools view -bh -F 512 " + wildcards.input \
    + ' > ' + wildcards.output

    call(callString, shell=True)

Thoughts?



标签: snakemake