Merging two lists files based on unique column

2019-08-20 06:46发布

I have two files, one is named NATLog with 3 columns and the other is Sourceports with 2 columns, Below is a sample of the NATLog file.

NATLog

14 172.18.2.12 445 
50 172.18.24.4 123
80 10.2.123.37 22
68 172.18.1.37 25

I want to match the last column of the NATLog file with the First Column of the Sourceports file and append the associated service to the NATLog file as a 4th column

Sourceports

445 SMB
123 Network Time Protocol (NTP)
22  SSH
25  SMTP(Insecure)

Desired Output

14 172.18.2.12 445 SMB 
50 172.18.24.4 123 Network Time Protocol (NTP)
80 10.2.123.37 22  SSH
68 172.18.1.37 25  SMTP(Insecure)

I am trying to learn AWK to accomplish this, but I am in need of some help, could you please assist me, thanks

标签: linux awk sed
5条回答
啃猪蹄的小仙女
2楼-- · 2019-08-20 07:26

This is why linux has a bunch of tiny tools such as cat,cut,paste and in this case join.

join -1 3 -2 1 natlog source

Join works on files where the column you try to join on is sorted.

Sorted is actually a somewhat wrong wording here. It should be more like equivalently ordered. As you notice, both your files have the same input and output and the column you try to join on is equivalent. So join will work without a problem.

If both files are not equivalently orderdered, you could use sort on it beforehand:

join -1 3 -2 1 <(sort -k3 natlog) <(sort source)

or if you just want to stick to a single program, than awk is the way forward:

awk '(NR==FNR){k=$3; $3=""; a[k]=$0; next}{ print $0,a[$1] }' natlog source

but if natlog and source don't have the same number of lines and/or keys, then you get the common part as

awk '(NR==FNR){k=$3; $3=""; a[k]=$0; next}($1 in a){ print $0,a[$1] }' natlog source
查看更多
Ridiculous、
3楼-- · 2019-08-20 07:33

Try awk,

$ awk ' NR==FNR {x=$1; $1="";a[x]=$0; next } { print $0, a[$3] } ' Sourceports NATLog
14 172.18.2.12 445   SMB
50 172.18.24.4 123  Network Time Protocol (NTP)
80 10.2.123.37 22  SSH
68 172.18.1.37 25  SMTP(Insecure)

$
查看更多
一夜七次
4楼-- · 2019-08-20 07:33

Yet another in awk (well, two actually). This is for the perfect world:

$ awk 'NR==FNR{a[$1]=$0;next}{sub($NF,a[$NF])}1' source natlog
14 172.18.2.12 445 SMB 
50 172.18.24.4 123 Network Time Protocol (NTP)
80 10.2.123.37 22  SSH
68 172.18.1.37 25  SMTP(Insecure)

Explained (and a bit expanded for unperfect world):

$ awk '
NR==FNR {                          # processing the source file
#   gsub(/&/,"\\\\&")              # if & chars in the file, uncomment to escape them
    a[$1]=$0                       # hash to a, port is the key
    next
}
{                                  # process natlog file
    sub($NF,a[$NF])                # replace port field with entry from source file
#   sub($NF,(a[$NF]?a[$NF]:$NF))   # if gaps in source, use this instead of above
}1' source natlog

One possible output (shorter ip, & char in source and unmatched port 222):

14 1.18.2.12   445 SMB & 
50 172.18.24.4 123 Network Time Protocol (NTP)
80 10.2.123.37 222
68 172.18.1.37 25  SMTP(Insecure)
查看更多
爷、活的狠高调
5楼-- · 2019-08-20 07:34

If your goal is the output formatting shown with the protocol column appended in an aligned fashion, then printf instead of print provides the same fine-grained formatting control described in man 3 printf (for the most part). In your case you simply need to get the length() of the port number field and subtract that from the desired total field-width to add that many spaces after the record from NATLog before appending the saved protocol from Sourceports.

You could do that similar to the following where a total field-width of 4 is used as an example:

$ awk '
    NR==FNR {pcl[$1] = $2; next} {printf "%s%*s%s\n",$0,4-length($3)," ",pcl[$3]}
' Sourceports NATLog

Output

14 172.18.2.12 445 SMB
50 172.18.24.4 123 Network
80 10.2.123.37 22  SSH
68 172.18.1.37 25  SMTP(Insecure)

(note: your Sourceports cannot contain additional whitespace at the end of the records. If it does, then you will have to replace $0 with individual $1,$2,$3 and adjust the format-string accordingly)

There are usually many ways to accomplish the same thing in awk so you can tailor it to meet whatever need you have.

Using paste and awk

A shorter, but less efficient way would be to use both paste and awk to achieve the same thing. (basically just outputting the first two fields of NATLog and appending the contents of Sourceports with paste, e.g.

$ paste -d ' ' <(awk '{print $1, $2}' NATLog) Sourceports
14 172.18.2.12 445 SMB
50 172.18.24.4 123 Network Time Protocol (NTP)
80 10.2.123.37 22  SSH
68 172.18.1.37 25  SMTP(Insecure)

(but that would really defeat the purpose of learning awk)

查看更多
我只想做你的唯一
6楼-- · 2019-08-20 07:35
awk '
    NR==FNR { key=$1; sub(/[^[:space:]]+[[:space:]]+/,""); map[key]=$0; next }
    { print $0, map[$3] }
' Sourceports NATLog
查看更多
登录 后发表回答