Record | RegistrationID
41-1|10551
1-105|5569
4-7|10043
78-3|2176
3-1|19826
12-1|1981
Output file has to
Record | RegistrationID
1-1|10551
3-1|19826
5-7|10043
My file is a Pipe delimited
any number in the 2nd col which is less than or more than 5lenght must be removed i.e only records that have 5 consecutive numbers must remain.I'm with google since an hour to fix this out any advice given would be highly appreciable. thanks in advance
tried this grep -E ' [0-9]{5}$|$' filename - > not getting any results ,tx to cyrus
If this doesn't do what you want:
$ awk '(NR==1) || ($NF~/^[0-9]{5}$/)' file
Acno | Zip
high | 12345
tyty | 19812
then your real input file simply does not match the format that you provided in your example and you'd have to follow up on that yourself to figure out the difference and post more truly representative sample input if you want more help.
Given your updated input file with no spaces around the |
s:
$ awk -F'|' '(NR==1) || ($NF~/^[0-9]{5}$/)' file
Acno | Zip
45775-1|10551
2734455-7|10043
167115-1|19826
If you REALLY have leading white space in your input that you want to remove from the output that's easily done but I'm going to assume for now that you actually don't really have that situation and it's just more mistakes in your posted sample input file.
With gawk 3.1.7 as the OP has (see comments below):
awk --re-interval -F'|' '(NR==1) || ($NF~/^[0-9]{5}$/)' file
If your columns (fields) are |
-separated, may contain spaces, and the filtering criteria is exactly 5 digits in the second field, then try this:
awk -F'|' '$2 ~ /^[ ]*[0-9]{5}[ ]*$/' file
Additionally, to pass-through the header (first) line in addition:
awk -F'|' 'NR==1 || $2 ~ /^[ ]*[0-9]{5}[ ]*$/' file
Add --re-interval option to support the interval expression in the regular expression.
gawk --re-interval -F'|' '$NF~/^[0-9]{4,5}$/' file