I have two locations where my huge data can be stored: /data
and /work
.
/data
is the folder where (intermediate) results are moved to after quality control. It is mounted read-only for the standard user.
/work
is the folder where new results are written to. Obviously, it is writable.
I do not want to copy or link data from /data
to /work
.
So I run my snakemake from within the /work
folder and want my input function first to check, if the required file exists in /data
(and return the absolute /data
path) and if not return the relative path in the /work
directory.
def in_func(wildcards):
file_path = apply_wildcards('{id}/{visit}/{id}_{visit}-file_name_1.txt', wildcards)
full_storage_path = os.path.join('/data', file_path)
if os.path.isfile(full_storage_path):
file_path = full_storage_path
return {'myfile': file_path}
rule do_something:
input:
unpack(in_func),
params = '{id}/{visit}/{id}_{visit}_params.txt',
This works fine but I would have to define separate input functions for every rule because the file names differ. Is is possible to write a generic input function that takes as input the file name e.g {id}/{visit}/{id}_{visit}-file_name_1.txt
and the wildcards?
I also tried something like
def in_func(file_path):
full_storage_path = os.path.join('/data', file_path)
if os.path.isfile(full_storage_path):
file_path = full_storage_path
file_path
rule do_something:
input:
myfile = in_func('{id}/{visit}/{id}_{visit}-file_name_1.txt')
params = '{id}/{visit}/{id}_{visit}_params.txt',
But then I do not have access to the wildcards in in_func()
, do I?
Thanks, Jan