Grep file with two columns as input

2019-05-30 06:06发布

问题:

I have a file containing lines like:

"ALMEREWEG               ";" 45  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 51  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

and I have a second file containing lines like:

3891ZP;50;
3891ZN;53;A
3891ZN;53;B
3891ZN;54;

Now I want to grep the first file based on the pattern of the second file, where:

A) the 1st column of the 2nd file is present in the 5th column of the 1st file; and

B) the 2nd column of the 2nd file is present in the 2nd column of the 1st file.

My question: how to do this?

Update 7 July 2013: I updated file2 format to reflect the third column (number suffices).

回答1:

One way with awk:

awk -F';' '
NR==FNR {
  a[$1]=$2
  next
}
{
  line=$0
  gsub(/\"/,"")
  gsub(/ *; */,";")
  if (a[$5]==$2) {
    print line
    line=""
  }
}' file2 file1

Output:

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"


回答2:

Heavily borrowing from @JS, I offer the following improved solution. The problem with his code is that if you have more than one house number in the same zip code, it will only match the last one. By creating a composite associative array (if that is the name... basically joining the two fields together), you get around this problem:

Create a file postcode.awk:

BEGIN {
  FS=";"
}
# loop around as long as the total number of records read
# is equal to the number of records read in this file
# in other words - loop around the first file only
NR==FNR {
  a[$1,$2]=1 # create one array element for each $1/$2 pair
  next
}
# loop around all the elements of the second file:
# since we're done processing the first file
{
  # copy the original line before modifying it
  line=$0
  # take out the double quotes
  gsub(/\"/,"")
  # take out the spaces on either side of the semicolons
  gsub(/ *; */,";")
  # see if the associative array element exists:
  if (a[$5,$2]==1) {
    # echo the original line that matched:
    print line
  }
}

Using test file file1 as follows (I have added a line to show the border case):

"ALMEREWEG               ";" 45  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

And key file file2 with (again, added a line):

3891ZP;50
3891ZP;52
3891ZN;53

You will see that JS's code will not match the line with number 50.

But my code does:

awk -f postcode.awk file2 file1

produces

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"


回答3:

You can use something like sed to construct patterns for grep:

$ grep -Ef <(sed -r 's/(.*);(.*)/^[^;]*;[^;]*\2[^;]*;([^;]*;){2}[^;]*\1/' file2) file1
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"


回答4:

I have splitted file2 into columns using bash's IFS and read. Then passed the columns to grep:

# read line by line
while IFS=$'\n' read line ; do
    # split into columns
    IFS=$';' read -a col <<< "$line"
    # the expression can be refined but should work well as is
    grep -e ' '${col[1]}'  ";".*;.*";"'${col[0]} file1
done < file2

Output:

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"