reference file
chr1 288598 288656
chr1 779518 779576
chr2 2569592 2569660
chr3 5018399 5018464
chr4 5182842 5182882
file1
chr1 288598 288656 12
chr1 779518 779576 14
chr2 2569592 2569660 26
chr3 5018399 5018464 27
chr4 5182842 5182882 37
file2
chr1 288598 288656 35
chr2 2569592 2569660 348
chr3 5018399 5018464 4326
chr4 5182842 5182882 68
I have six similar files excluding the reference file.
Here first three fields are similar to the reference file. Therefore, I would like export only 4th column from all 6 files and put into the reference file to make a new output. which should be equivalent to the reference files. Where they don't match put zero.
desired output
chr1 288598 288656 23 35 57 68 769 68
chr1 779518 779576 23 0 57 68 768 0
chr2 2569592 2569660 23 35 0 68 79 0
chr3 5018399 5018464 0 36 0 68 769 0
chr4 5182842 5182882 23 0 0 0 0 0
Note: the reference file length is about 2000 ans the other files are not always in same the length (about 500, 400, 200, 100 etc). That is why need zero added.
I tried the answer from this question
paste ref.file file1 file2 file3 file4 file5 file6 | awk '{OFS="\t";print $1,$2,$3,$7,$11,$15,$19,$23,$27}' > final.common.out
but seems it's not working — some values are missed. And I can't understand how to add zero where there is no match.
I think something like this should do what you want. We use a hash to gather the 'reference' file and turn it into a set of keys with an empty array.
Then we iterate on the other files, extracting '3 values' as key, and the last value as an actual value.
And then we compare the two, updating the 'reference' hash with either the value or zero. The caveat here - any lines not in your reference file (or duplicates) will just disappear.