Snakemake: Generic input function for different fi

I have two locations where my huge data can be stored: /data and /work.

/data is the folder where (intermediate) results are moved to after quality control. It is mounted read-only for the standard user. /work is the folder where new results are written to. Obviously, it is writable.

I do not want to copy or link data from /data to /work.

So I run my snakemake from within the /work folder and want my input function first to check, if the required file exists in /data (and return the absolute /data path) and if not return the relative path in the /work directory.

def in_func(wildcards):
    file_path = apply_wildcards('{id}/{visit}/{id}_{visit}-file_name_1.txt', wildcards)
    full_storage_path = os.path.join('/data', file_path)
    if os.path.isfile(full_storage_path):
        file_path = full_storage_path
    return {'myfile': file_path}

rule do_something:
    input:
        unpack(in_func),
        params = '{id}/{visit}/{id}_{visit}_params.txt',

This works fine but I would have to define separate input functions for every rule because the file names differ. Is is possible to write a generic input function that takes as input the file name e.g {id}/{visit}/{id}_{visit}-file_name_1.txt and the wildcards?

I also tried something like

def in_func(file_path):
    full_storage_path = os.path.join('/data', file_path)
    if os.path.isfile(full_storage_path):
        file_path = full_storage_path
    file_path

rule do_something:
    input:
        myfile = in_func('{id}/{visit}/{id}_{visit}-file_name_1.txt')
        params = '{id}/{visit}/{id}_{visit}_params.txt',

But then I do not have access to the wildcards in in_func(), do I?

Thanks, Jan

标签： snakemake

1条回答

该账号已被封号

2楼-- · 2019-08-23 00:08

You could use something like this:

def handle_storage(pattern):
    def handle_wildcards(wildcards):
        f = pattern.format(**wildcards)
        f_data = os.path.join("/data", f)
        if os.path.exists(f_data):
            return f_data
        return f

    return handle_wildcards


rule do_something:
    input:
        myfile = handle_storage('{id}/{visit}/{id}_{visit}-file_name_1.txt')
        params = '{id}/{visit}/{id}_{visit}_params.txt',

In other words, the function handle_storage returns a pointer to the handle_wildcards function that is tailored for the particular pattern. The latter is then automatically applied by Snakemake once the wildcard values are known. Inside that function, we first format the pattern and then check if it exists in /data.

0人赞添加讨论(0) 举报

Snakemake: Generic input function for different fi

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间