awk to compare two file by identifier & output in

2019-09-20 10:35发布

问题:

I have 2 large files i need to compare all pipe delimited

file 1

a||d||f||a
1||2||3||4

file 2

a||d||f||a
1||1||3||4
1||2||r||f

Now I want to compare the files & print accordingly such as if any update found in file 2 will be printed as updated_value#oldvalue & any new line added to file 2 will also be updated accordingly.

So the desired output is: (only the updated & new data)

1||1#2||3||4
1||2||r||f

what I have tried so far is to get the separated changed values:

awk -F '[||]+' 'NR==FNR{for(i=1;i<=NF;i++)a[NR,i]=$i;next}{for(i=1;i<=NF;i++)if(a[FNR,i]!=$i)print $i"#"a[FNR,i]}' file1 file2 >output

But I want to print the whole line. How can I achieve that??

回答1:

I would say:

awk 'BEGIN{FS=OFS="|"}
     FNR==NR {for (i=1;i<=NF;i+=2) a[FNR,i]=$i; next}
     {for (i=1; i<=NF; i+=2)
         if (a[FNR,i] && a[FNR,i]!=$i)
             $i=$i"#"a[FNR,i]
     }1' f1 f2

This stores the file1 in a matrix a[line number, column]. Then, it compares its values with its correspondence in file2.

Note I am using the field separator | instead of || and looping in steps of two to use the proper data. This is because I for example did gawk -F'||' '{print NF}' f1 and got just 1, meaning that FS wasn't well understood. Will be grateful if someone points the error here!

Test

$ awk 'BEGIN{FS=OFS="|"} FNR==NR {for (i=1;i<=NF;i+=2) a[FNR,i]=$i; next} {for (i=1; i<=NF; i+=2) if (a[FNR,i] && a[FNR,i]!=$i) $i=$i"#"a[FNR,i]}1' f1 f2
a||d||f||b#a
1||1#2||3||4
1||2||r||f