Is it possible to define default settings for memory and resources in cluster config file, and then override in rule specific manner, when needed? Is resources
field in rules directly tied to cluster config file? Or is it just a fancy way for params
field for readability purposes?
In the example below, how do I use default cluster configs for rule a
, but use custom changes (memory=40000
and rusage=15000
) in rule b
?
cluster.json:
{
"__default__":
{
"memory": 20000,
"resources": "\"rusage[mem=8000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
Snakefile:
rule all:
'a_out.txt', 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
shell:
'touch {output}'
Command for execution:
snakemake --cluster-config cluster.json
--cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt"
-j 50
I understand that it is possible to define rule specific resources requirements in cluster config file, but I would prefer to define them directly in Snakefile, if possible.
Or else, if there is a better way of implementing this, please let me know.
You can directly add
resources
to each of your rules :And then, you should remove the
resources
parameter from your.json
, so that the command line would not override the snakefile:new.cluster.json:
In
new.cluster.json
you can actually define resources for specific rules. So in your case you would do the followingThen in the
Snakefile
you can refer to these resources by importingnew.cluster.json
and referring to it in your ruleIf you take a look through this repository, you can see how I use these cluster configs in the wild.