Snakemake: Generic input function for different fi

2019-08-22 23:22发布

问题:

I have two locations where my huge data can be stored: /data and /work.

/data is the folder where (intermediate) results are moved to after quality control. It is mounted read-only for the standard user. /work is the folder where new results are written to. Obviously, it is writable.

I do not want to copy or link data from /data to /work.

So I run my snakemake from within the /work folder and want my input function first to check, if the required file exists in /data (and return the absolute /data path) and if not return the relative path in the /work directory.

def in_func(wildcards):
    file_path = apply_wildcards('{id}/{visit}/{id}_{visit}-file_name_1.txt', wildcards)
    full_storage_path = os.path.join('/data', file_path)
    if os.path.isfile(full_storage_path):
        file_path = full_storage_path
    return {'myfile': file_path}

rule do_something:
    input:
        unpack(in_func),
        params = '{id}/{visit}/{id}_{visit}_params.txt',

This works fine but I would have to define separate input functions for every rule because the file names differ. Is is possible to write a generic input function that takes as input the file name e.g {id}/{visit}/{id}_{visit}-file_name_1.txt and the wildcards?

I also tried something like

def in_func(file_path):
    full_storage_path = os.path.join('/data', file_path)
    if os.path.isfile(full_storage_path):
        file_path = full_storage_path
    file_path

rule do_something:
    input:
        myfile = in_func('{id}/{visit}/{id}_{visit}-file_name_1.txt')
        params = '{id}/{visit}/{id}_{visit}_params.txt',

But then I do not have access to the wildcards in in_func(), do I?

Thanks, Jan

回答1:

You could use something like this:

def handle_storage(pattern):
    def handle_wildcards(wildcards):
        f = pattern.format(**wildcards)
        f_data = os.path.join("/data", f)
        if os.path.exists(f_data):
            return f_data
        return f

    return handle_wildcards


rule do_something:
    input:
        myfile = handle_storage('{id}/{visit}/{id}_{visit}-file_name_1.txt')
        params = '{id}/{visit}/{id}_{visit}_params.txt',

In other words, the function handle_storage returns a pointer to the handle_wildcards function that is tailored for the particular pattern. The latter is then automatically applied by Snakemake once the wildcard values are known. Inside that function, we first format the pattern and then check if it exists in /data.



标签: snakemake