remove all fields from 2nd col which is not 5 cons

2020-04-19 06:42发布

问题:

 Record | RegistrationID

 41-1|10551
 1-105|5569
  4-7|10043
  78-3|2176
   3-1|19826
   12-1|1981

Output file has to

 Record | RegistrationID
1-1|10551
3-1|19826
5-7|10043

My file is a Pipe delimited

any number in the 2nd col which is less than or more than 5lenght must be removed i.e only records that have 5 consecutive numbers must remain.I'm with google since an hour to fix this out any advice given would be highly appreciable. thanks in advance

tried this grep -E ' [0-9]{5}$|$' filename - > not getting any results ,tx to cyrus

回答1:

If this doesn't do what you want:

$ awk '(NR==1) || ($NF~/^[0-9]{5}$/)' file
 Acno | Zip
 high | 12345
tyty | 19812

then your real input file simply does not match the format that you provided in your example and you'd have to follow up on that yourself to figure out the difference and post more truly representative sample input if you want more help.

Given your updated input file with no spaces around the |s:

$ awk -F'|' '(NR==1) || ($NF~/^[0-9]{5}$/)' file
 Acno | Zip
 45775-1|10551
  2734455-7|10043
   167115-1|19826

If you REALLY have leading white space in your input that you want to remove from the output that's easily done but I'm going to assume for now that you actually don't really have that situation and it's just more mistakes in your posted sample input file.

With gawk 3.1.7 as the OP has (see comments below):

awk --re-interval -F'|' '(NR==1) || ($NF~/^[0-9]{5}$/)' file


回答2:

If your columns (fields) are |-separated, may contain spaces, and the filtering criteria is exactly 5 digits in the second field, then try this:

awk -F'|' '$2 ~ /^[ ]*[0-9]{5}[ ]*$/' file

Additionally, to pass-through the header (first) line in addition:

awk -F'|' 'NR==1 || $2 ~ /^[ ]*[0-9]{5}[ ]*$/' file


回答3:

Add --re-interval option to support the interval expression in the regular expression.

gawk --re-interval -F'|' '$NF~/^[0-9]{4,5}$/' file


标签: awk sed grep