how to Merge 2 tables with awk

2019-08-27 00:06发布

问题:

First of all, sorry for my English and I know there's a lot of various topics regarding AWK but it's a very difficult function to me... I would like to merge two tables using common columns with awk. The tables differ in the amount of rows. I have my first table that I want to modify and the second as a reference table. I would like to compare my colunme1.F1 with my column1.F2. When it matches, add the column2.F2 in my file1. But I need to keep all my lines in file1.

I give you an example:

File1

Num_id,Name,description1,description2,description3
?,atlanta_1,,,      
RO_5,babeni_SW,,,           
? ,Bib1,,,          
RO_9,BoUba_456,,,           
?,Castor,,,         

File2

official_Num_id,official_Name
RO_1,America
RO_2,Andre
RO_3,Atlanta
RO_4,Axa
RO_5,Babeni
RO_6,Barba
RO_7,Bib
RO_8,Bilbao
RO_9,Bouba
RO_10,Castor

File3

Num_id,Name,description1,description2,description3,official_Name
?,atlanta_1,,,
RO_5,babeni_SW,,,Babeni
?,Bib1,,,
RO_9,BoUba_456,,,Bouba
?,Castor,,,

I read a lot of solution on Internet and it seems that awk could work .. I tried awk 'NR==FNR {h[$1] = $2; next} {print $0,h[$1]}' $File1 $File2 > file3 But my command doesn't work, my File3 looks exactly that File1.

In a second time, I don't know if it's possible to compare my two second columns when names have difference like atlanta_1 and Atlanta and add the official_num_id and the official_name in my File1.

Any hero over there?

回答1:

You had it, except for two small things. First you need to set your file separators to , and, second, reverse the order of your input files on the command line so that the reference file is processed first:

$ awk 'BEGIN {FS=OFS=","} NR==FNR {h[$1] = $2; next} {print $0,h[$1]}' File2 File1
Num_id,Name,description1,description2,description3,
?,atlanta_1,,,,
RO_5,babeni_SW,,,,Babeni
? ,Bib1,,,,
RO_9,BoUba_456,,,,Bouba
?,Castor,,,,


回答2:

You can also use the join command for this:

 join --header  --nocheck-order -t, -1 1 -2 1 -a 1 file1 file2


回答3:

To answer your question if it's possible to compare my two second columns when names have difference like atlanta_1 and Atlanta and add the official_num_id and the official_name in my File1:

$ awk '
BEGIN { FS=OFS="," }
NR==FNR {                                                  # file2
    a[tolower($2)]=$0                                      # hash on lowercase city
    next
}
{                                                          # file1
    split($2,b,"[^[:alpha:]]")                             # split on non-alphabet
    print $0 (tolower(b[1]) in a?OFS a[tolower(b[1])]:"")  
}' file2 file1 
Num_id,Name,description1,description2,description3
?,atlanta_1,,,,RO_3,Atlanta
RO_5,babeni_SW,,,,RO_5,Babeni
? ,Bib1,,,,RO_7,Bib
RO_9,BoUba_456,,,,RO_9,Bouba
?,Castor,,,,RO_10,Castor

split will split Name field on non-alphabetic characters, ie _ in atlanta_1, 1 in Bib1 etc. so it might fail on cities with dashes etc., edit the pattern [^[:alpha:]] in split accordingly. Header doesn't match with those names, rethink the header names.