First off all, this could be a duplicate of Snakemake and pandas syntax. However, I'm still confused so I'd like to explain again.
In Snakemake I have loaded a sample table with several columns. One of the columns is called 'Read1', it contains sample specific read lengths. I would like to get this value for every sample separately as it may differ.
What I would expect to work is this:
rule mismatch_profile:
input:
rseqc_input_bam
output:
os.path.join(rseqc_dir, '{sample}.mismatch_profile.xls')
conda:
"../envs/rseqc.yaml"
params:
read_length = samples.loc['{sample}']['Read1']
shell:
'''
#!/bin/bash
mismatch_profile.py -i {input} -o {rseqc_dir}/{wildcards.sample} -l {params.read_length}
However, that does not work. For some reason I am not allowed to use {sample} inside standard Pandas syntax and I get this error:
KeyError in line 41 of /rst1/2017-0205_illuminaseq/scratch/swo-406/test_snakemake_full/rules/rseqc.smk:
'the label [{sample}] is not in the [index]'
I don't understand why this does not work. I read that I can also use lambda functions but I don't really understand exactly how since they still need {sample} as input.
Could anyone help me?
You could use lambda function