I have two locations where my huge data can be stored: /data
and /work
.
/data
is the folder where (intermediate) results are moved to after quality control. It is mounted read-only for the standard user.
/work
is the folder where new results are written to. Obviously, it is writable.
I do not want to copy or link data from /data
to /work
.
So I run my snakemake from within the /work
folder and want my input function first to check, if the required file exists in /data
(and return the absolute /data
path) and if not return the relative path in the /work
directory.
def in_func(wildcards):
file_path = apply_wildcards('{id}/{visit}/{id}_{visit}-file_name_1.txt', wildcards)
full_storage_path = os.path.join('/data', file_path)
if os.path.isfile(full_storage_path):
file_path = full_storage_path
return {'myfile': file_path}
rule do_something:
input:
unpack(in_func),
params = '{id}/{visit}/{id}_{visit}_params.txt',
This works fine but I would have to define separate input functions for every rule because the file names differ. Is is possible to write a generic input function that takes as input the file name e.g {id}/{visit}/{id}_{visit}-file_name_1.txt
and the wildcards?
I also tried something like
def in_func(file_path):
full_storage_path = os.path.join('/data', file_path)
if os.path.isfile(full_storage_path):
file_path = full_storage_path
file_path
rule do_something:
input:
myfile = in_func('{id}/{visit}/{id}_{visit}-file_name_1.txt')
params = '{id}/{visit}/{id}_{visit}_params.txt',
But then I do not have access to the wildcards in in_func()
, do I?
Thanks, Jan
You could use something like this:
In other words, the function handle_storage returns a pointer to the handle_wildcards function that is tailored for the particular pattern. The latter is then automatically applied by Snakemake once the wildcard values are known. Inside that function, we first format the pattern and then check if it exists in
/data
.