Using Conda enviroment in SnakeMake on SGE cluster

2019-08-14 12:34发布

问题:

Related: SnakeMake rule with Python script, conda and cluster

I have been trying to set up my SnakeMake pipelines to run on SGE clusters (qsub). Using simple commands or tools that are installed directly to computational nodes, there is no problem. However, there is a problem when I try to set up SnakeMake to download tools through Conda on SGE nodes.

My testing Snakefile is:

rule bwa_sge_c_test:
    conda:
        "bwa.yaml"
    shell:
        "bwa > snaketest.txt"

"bwa.yaml" file is:

channels:
    - bioconda
dependencies:
    - bwa=0.7.17

I run SnakeMake with command:

snakemake -d "/home/<username>" --use-conda --cluster "qsub -cwd -q testing-nod08" --jobs 1

As a result, I get this error in SGE computational node:

/usr/bin/python3: No module named snakemake.__main__; 'snakemake' is a package and cannot be directly executed
touch: cannot touch '/home/krampl/.snakemake/tmp.7le8izvw/0.jobfailed': No such file or directory

I have tried to append "snakemake=5.2.2" to "bwa.yaml" (as a colleague suggested), but the error remains.

My questions are: What causes this error and how to fix this so I can run Conda enviroments from SnakeMake in SGE clusters?

回答1:

You probably need to send your environment variables to qsub.

snakemake -d "/home/<username>" --use-conda --cluster "qsub -V -cwd -q testing-nod08" --jobs 1

The -V will send all your environment variables to qsub job including your PATH. What I usually do to send job over SGE is build a script that will encapsulate my jobs:

sge.sh

#$ -cwd
#$ -V
#$ -e ./logs/
#$ -o ./logs/

{exec_job}

You can of course use other options like -q and then use snakemake as follow:

snakemake --cluster "qsub" --jobscript sge.sh ....

(do not forget to create the logs folder before calling snakemake)