i have three similar files, they are all like this:
File A
ID1 Value1a
ID2 Value2a
.
.
.
IDN Value2n
and i want an output like this
Output
ID1 Value1a Value1b Value1c
ID2 Value2a Value2b Value2c
.....
IDN ValueNa ValueNb ValueNc
Looking to the first line, i want value1A to be the value of id1 in fileA, value1B the value of id1 in fileB, and so on which each field and each line. I thougth it like a sql join. I've tried several things but none of them where even close.
EDIT: All files have the same length and ids.
Give join(1) a try:
join fileA fileB | join - fileC
join
(Dennis's answer) is better, but just for kicks, here's what I came up with in awk
:
awk '{a=$0; getline b <"fileB"; getline c <"fileC"; $0=a" "b" "c; print $1,$2,$4,$6}' <fileA
Update: The question has been edited to state that all files contain all keys, so the accepted answer (join
) is definitely better than this one. Only consider using this one if it's possible the keys may not be in all files.
If you're not too concerned about performance, you could try the quick and dirty:
$ cat file_a
ID5 Value5a
ID1 Value1a
ID3 Value3a
ID4 Value4a
ID2 Value2a
$ cat file_b
ID1 Value1b
ID3 Value3b
$ cat file_c
ID2 Value2c
ID3 Value3c
ID4 Value4c
ID5 Value5c
$ cat qq.sh
#!/bin/bash
keylist=$(awk '{print $1'} file_[abc] | sort | uniq)
for key in ${keylist} ; do
val_a=$(grep "^${key} " file_a | awk '{print $2}') ; val_a=${val_a:--}
val_b=$(grep "^${key} " file_b | awk '{print $2}') ; val_b=${val_b:--}
val_c=$(grep "^${key} " file_c | awk '{print $2}') ; val_c=${val_c:--}
echo ${key} ${val_a} ${val_b} ${val_c}
done
$ ./qq.sh
ID1 Value1a Value1b -
ID2 Value2a - Value2c
ID3 Value3a Value3b Value3c
ID4 Value4a - Value4c
ID5 Value5a - Value5c
This actually works out the keys first then gets the values from each file with that key, or -
if it's not in the relevant file.
The grep
commands will need to be adjusted if the file is more complex (either if field 1 isn't at the start of the line or is followed by a non-space separator) but this should be a reasonable first-cut solution. The likely grep
to use in that case would be:
grep "^[ X]*${key}[ X]"
where X
is actually the tab character, as this allows for zero-or-more spaces or tabs before the key and a space or tab to terminate the key.
If the files are particularly large, you may want to look into using the associative arrays within awk
but, since there's no indication of the size, I'd start with this one until you get to the point where it's running too slow.
Just to add that in order for join to work the input should be sorted.
This awk solution should handle any number of input files.
You will also loose the original order of the keys (you'll need more code to preserve it).
awk 'END {
for (K in k) print K, k[K]
}
{
k[$1] = k[$1] ? k[$1] FS $2 : $2
}' file1 file2 [.. filen]