I would like to combine entries from the second field from two files using awk, sed or similar.
File0:
string:data:moredata
File1:
string:random:moredata
If the first field, string in file0 has a matching entry in file1 then print
$random:$data
Selecting the fields seems trivial:
$ awk -F':' '{print $2}' filename
Need to match rows and print matching column $2
How about this one -
awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2
Execution:
[jaypal~/Temp]$ cat file1
string:data:moredata
[jaypal~/Temp]$ cat file2
string:random:moredata
[jaypal~/Temp]$ awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2
random:data
In this solution, we are loading the whole record of file1 in to the array indexed at column 1. We do a quick check in the next file to see if the column 1 is present. If it is then print statement is executed.
Negative Test:
[jaypal~/Temp]$ cat file1
string:data:moredata
man:woman:child
[jaypal~/Temp]$ cat file2
man:random:moredata
string:woman:child
[jaypal~/Temp]$ awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2
random:woman
woman:data
Just to add to the explanation, NR and FNR are awk's in-built variables. NR gives line number and does not get reset to 0 when looped over two files. FNR on the contrary is also a line number that gets reset to 0 when second file starts. Thus this allows us to store the file 1 into the array because that action is only done when NR==FNR. As soon as this condition becomes false, it means the second file has started and next pattern action statement begins to execute.
This sed
solution might work for you:
# cat file1
string0:data1:moredata
string2:data3:moredata
string4:data5:moredata
string6:data7:moredata
string8:data9:moredata
# cat file2
string0:random1:moredata
string2:random3:moredata
string4:random5:moredata
cat file1 - <<<"EOF" file2 |
sed '1,/^EOF/{H;d};G;s/^\([^:]*:\)\([^:]*:\).*\1\([^:]*\).*/$\2$\3/p;d'
$random1:$data1
$random3:$data3
$random5:$data5
Explanation:
Concatenate the files with an EOF
delimiter. Slurp the first file into the hold space (HS). Append the HS to all lines in the second file making a lookup table. Use grouping and backreferences to substitute the required output result. BTW were the $
's in $random:$data
intended?
This solution might also be made more efficient by only retaining the necessary data in the lookup and each line of file2.
join - join lines of two files on a common field
So do your awk thing, only print both the data and the "key" field. Then do a join command similar to: join -1 1 -2 1 file1 file2 > joined.dat