I have two files and I use the "comm -23 file1 file2" command to extract the lines that are different from a file to another.
I would also need something that extracts the different lines but also preserves the string "line_$NR".
Example:
file1:
line_1: This is line0
line_2: This is line1
line_3: This is line2
line_4: This is line3
file2:
line_1: This is line1
line_2: This is line2
line_3: This is line3
I need this output:
differences file1 file2:
line_1: This is line0.
In conclusion I need to extract the differences as if the file has not line_$NR at the beginning but when I print the result I need to also print line_$NR.
Try using awk
awk -F: 'NR==FNR {a[$2]; next} !($2 in a)' file2 file1
Output:
line_1: This is line0
Short Description
awk -F: ' # Set filed separator as ':'. $1 contains line_<n> and $2 contains 'This is line_<m>'
NR==FNR { # If Number of records equal to relative number of records, i.e. first file is being parsed
a[$2]; # store $2 as a key in associative array 'a'
next # Don't process further. Go to next record.
}
!($2 in a) # Print a line if $2 of that line is not a key of array 'a'
' file2 file1
Additional Requirement (In comment)
And if I have multiple ":" in a line : "line_1: This :is: line0"
doesn't work. How can I only take the line_x
In that case, try following (GNU awk only)
awk -F'line_[0-9]+:' 'NR==FNR {a[$2]; next} !($2 in a)' file2 file1
this awk line is longer, however it would work no matter where the differences were located:
awk 'NR==FNR{a[$NF]=$0;next}a[$NF]{a[$NF]=0;next}7;END{for(x in a)if(a[x])print a[x]}' file1 file2
test:
kent$ head f*
==> f1 <==
line_1: This is line0
line_2: This is line1
line_3: This is line2
line_4: This is line3
==> f2 <==
line_1: This is line1
line_2: This is line2
line_3: This is line3
#test f1 f2
kent$ awk 'NR==FNR{a[$NF]=$0;next}a[$NF]{a[$NF]=0;next}7;END{for(x in a)if(a[x])print a[x]}' f1 f2
line_1: This is line0
#test f2 f1:
kent$ awk 'NR==FNR{a[$NF]=$0;next}a[$NF]{a[$NF]=0;next}7;END{for(x in a)if(a[x])print a[x]}' f2 f1
line_1: This is line0