I have 2 file
file1:
12342015010198765hello
12342015010188765hello
12342015010178765hello
whose each line contains fields at fixed positions, for example, position 13 - 17
is for account_id
file2:
98765
88765
which contains a list of account_id
s.
In Korn Shell, I want to print lines from file1 whose position 13 - 17
match one of account_id
in file2.
I can't do
grep -f file2 file1
because account_id
in file2 can match other fields at other positions.
I have tried using pattern in file2:
^.{12}98765.*
but did not work.
Using awk
$ awk 'NR==FNR{a[$1]=1;next;} substr($0,13,5) in a' file2 file1
12342015010198765hello
12342015010188765hello
How it works
NR==FNR{a[$1]=1;next;}
FNR is the number of lines read so far from the current file and NR is the total number of lines read so far. Thus, if FNR==NR
, we are reading the first file which is file2
.
Each ID in in file2 is saved in array a
. Then, we skip the rest of the commands and jump to the next
line.
substr($0,13,5) in a
If we reach this command, we are working on the second file, file1
.
This condition is true if the 5 character long substring that starts at position 13 is in array a
. If the condition is true, then awk performs the default action which is to print the line.
Using grep
You mentioned trying
grep '^.{12}98765.*' file2
That uses extended regex syntax which means that -E
is required. Also, there is no value in matching .*
at the end: it will always match. Thus, try:
$ grep -E '^.{12}98765' file1
12342015010198765hello
To get both lines:
$ grep -E '^.{12}[89]8765' file1
12342015010198765hello
12342015010188765hello
This works because [89]8765
just happens to match the IDs of interest in file2. The awk solution, of course, provides more flexibility in what IDs to match.
Using sed
with extended regex:
sed -r 's@.*@/^.{12}&/p@' file2 |sed -nr -f- file1
Using Basic regex:
sed 's@.*@/^.\\{12\\}&/p@' file1 |sed -n -f- file
Explanation:
sed -r 's@.*@/^.{12}&/p@' file2
will generate an output:
/.{12}98765/p
/.{12}88765/p
which is then used as a sed
script for the next sed
after pipe, which outputs:
12342015010198765hello
12342015010188765hello