Logically Impossible to fetch this particular stri

2019-06-12 13:58发布

I have 3 strings which are random and look somewhat like this

1) ENTL.COMPENSATION REM      REVERSE PAYMENT COUPON ON ISIN //IT0004889033  IN A TRIPARTY //TRANSACTION WITH 95724
2) 01P ISIN DE000A1H36U5 QTY 44527000, //C/P 19696
3) COUPON ISIN XS0820547742 QTY 466750,

Now what is expected is to fetch the values IT0004889033 or DE000A1H36U5 or XS0820547742. If you observe the 3 strings, these 3 expected values come rite after the ISIN. So we can take isin as a reference and then fetch the values after ISIN. But that is not what is required it seems. We should not fetch the value by taking some value as a reference.

Since the expected value is IT0004889033 which is a 12 digit character the information I have is; first 2 characters are alphabets, next 9 are alphanumeric and the last one is a digit. Just with this information is it possible to do a wildcard search or something and fetch this 12 digit value.?

I'm totally lost on this one logically.

2条回答
Bombasti
2楼-- · 2019-06-12 14:23

Using grep -oP:

grep -oP 'ISIN\W+\K\w+' file
IT0004889033
DE000A1H36U5
XS0820547742

if grep -P isn't available then you can use use awk:

awk -F '.*ISIN[^0-9a-zA-Z]*| ' '{print $2}' file
IT0004889033
DE000A1H36U5
XS0820547742

OR else:

awk -F '.*ISIN[^[:alnum:]]*| ' '{print $2}' file
查看更多
祖国的老花朵
3楼-- · 2019-06-12 14:31

You mentioned that ISIN should not be used as a reference. Therefore, the only thing for sure is that the string to be found starts with 2 letters, followed by 9 letters and/or numbers, and ends with a number.

I saved your example text as tmp, and ran the following egrep command... seems to work for me:

jim@debian:~/tmp$ egrep -o "[a-zA-Z]{2}[a-zA-Z0-9]{9}[0-9]{1}" tmp
IT0004889033
DE000A1H36U5
XS0820547742

The above solution is more correct than the previous ones because it takes a fixed amount of characters to filter the results. Only 12-character strings will be returned by the above code.

I hope this helps!

查看更多
登录 后发表回答