grep matching specific position in lines using wor

2019-05-07 08:41发布

问题:

I have 2 file

file1:

12342015010198765hello
12342015010188765hello
12342015010178765hello

whose each line contains fields at fixed positions, for example, position 13 - 17 is for account_id

file2:

98765
88765

which contains a list of account_ids.

In Korn Shell, I want to print lines from file1 whose position 13 - 17 match one of account_id in file2.

I can't do

grep -f file2 file1

because account_id in file2 can match other fields at other positions.

I have tried using pattern in file2:

^.{12}98765.*

but did not work.

回答1:

Using awk

$ awk 'NR==FNR{a[$1]=1;next;} substr($0,13,5) in a' file2 file1
12342015010198765hello
12342015010188765hello

How it works

  • NR==FNR{a[$1]=1;next;}

    FNR is the number of lines read so far from the current file and NR is the total number of lines read so far. Thus, if FNR==NR, we are reading the first file which is file2.

    Each ID in in file2 is saved in array a. Then, we skip the rest of the commands and jump to the next line.

  • substr($0,13,5) in a

    If we reach this command, we are working on the second file, file1.

    This condition is true if the 5 character long substring that starts at position 13 is in array a. If the condition is true, then awk performs the default action which is to print the line.

Using grep

You mentioned trying

grep '^.{12}98765.*' file2

That uses extended regex syntax which means that -E is required. Also, there is no value in matching .* at the end: it will always match. Thus, try:

$ grep -E '^.{12}98765' file1
12342015010198765hello

To get both lines:

$ grep -E '^.{12}[89]8765' file1
12342015010198765hello
12342015010188765hello

This works because [89]8765 just happens to match the IDs of interest in file2. The awk solution, of course, provides more flexibility in what IDs to match.



回答2:

Using sed with extended regex:

sed -r 's@.*@/^.{12}&/p@' file2 |sed -nr -f- file1

Using Basic regex:

sed 's@.*@/^.\\{12\\}&/p@' file1 |sed -n -f- file

Explanation:

sed -r 's@.*@/^.{12}&/p@' file2

will generate an output:

/.{12}98765/p
/.{12}88765/p

which is then used as a sed script for the next sed after pipe, which outputs:

12342015010198765hello
12342015010188765hello


标签: shell unix grep