Thread.py error snakemake

2019-03-01 03:24发布

问题:

I am trying to run a simple one-rule snakemake file as following:

resources_dir='resources'

rule downloadReference:
    output:
        fa = resources_dir+'/human_g1k_v37.fasta',
        fai = resources_dir+'/human_g1k_v37.fasta.fai',
    shell:
        ('mkdir -p '+resources_dir+'; cd '+resources_dir+'; ' +
        'wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz; gunzip human_g1k_v37.fasta.gz; ' +
        'wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.fai;')

But I get an error as :

    Error in job downloadReference while creating output files 
    resources/human_g1k_v37.fasta, resources/human_g1k_v37.fasta.fai.
    RuleException:
    CalledProcessError in line 10 of 
    /lustre4/home/masih/projects/NGS_pipeline/snake_test:
    Command 'mkdir -p resources; cd resources; wget ftp://ftp-
  trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz; gunzip human_g1k_v37.fasta.gz; wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.fai;' returned non-zero exit status 2.
      File "/lustre4/home/masih/projects/NGS_pipeline/snake_test", line 10, in __rule_downloadReference
      File "/home/masih/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 55, in run
    Removing output files of failed job downloadReference since they might be corrupted:
    resources/human_g1k_v37.fasta
    Will exit after finishing currently running jobs.
    Exiting because a job execution failed. Look above for error message

I am not using the threads option in snakemake. I can not figure out how this is related with thread.py. Anybody has experience with this error?

回答1:

When a shell command fails, it has an exit status which is not 0. This is what "returned non-zero exit status 2" indicates.

One of your shell command fails, and the failure is propagated to snakemake. I suppose that snakemake uses threads and that the failure manifests itself at the level of some code in the threads.py file1.

In order to better understand what is happening, we can capture the first error using the || operator followed by a function issuing an error message:

# Define functions to be used in shell portions
shell.prefix("""
# http://linuxcommand.org/wss0150.php
PROGNAME=$(basename $0)

function error_exit
{{
#   ----------------------------------------------------------------
#   Function for exit due to fatal program error
#       Accepts 1 argument:
#           string containing descriptive error message
#   ----------------------------------------------------------------
    echo "${{PROGNAME}}: ${{1:-"Unknown Error"}}" 1>&2
    exit 1
}}
""")

resources_dir='resources'

rule downloadReference:
    output:
        fa = resources_dir+'/human_g1k_v37.fasta',
        fai = resources_dir+'/human_g1k_v37.fasta.fai',
    params:
        resources_dir = resources_dir
    shell:
        """
        mkdir -p {params.resources_dir}
        cd {params.resources_dir}
        wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz || error_exit "fasta download failed"
        gunzip human_g1k_v37.fasta.gz || error_exit "fasta gunzip failed"
        wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.fai || error_exit "fai download failed"
        """

When I run this, I get the following message after the messages of the first download:

gzip: human_g1k_v37.fasta.gz: decompression OK, trailing garbage ignored
bash: fasta gunzip failed

It turns out that gzip uses a non-zero exit code in case of warnings:

Exit status is normally 0; if an error occurs, exit status is 1. If a warning occurs, exit status is 2.

(from the DIAGNOSTICS section of man gzip)

If I remove the error-capturing || error_exit "fasta gunzip failed", the workflow is able to complete. So I don't understand why you had this error in the first place.

I'm surprised that gzip authors decided to use a non-zero status in case of a simple warning. They added a -q option to turn off this specific warning, due to the presence of trailing zeroes, but strangely, the exit status is still non-zero when this option is used.


1 According to Johannes Köster, author of snakemake:

Sorry for the misleading thread.py thing, this is just the place where snakemake detects the problem. The real issue is that your command exits with exit code 2, which indicates an error not related to Snakemake