Combine a positive lookahead and negative lookahea

2019-07-18 17:19发布

I am not good with regex, but I have the following, but I assume part of the following means look for 13 - 16 digits and then return a success if it finds 3 - 4 digits after that. The problem is that the 3 - 4 digits are optional and they can also be before the 13 - 16 digit number, so I guess I want to combine a positive lookahead/lookbehind, negative lookahead/lookbehind. This sounds way to complex, is there a simpler way?

(\d{13,16})[<"'].*?(?=[>"']\d{3,4}[<"'])[>"'](\d{3,4})[<"']

which will match the ccnum and the series in the following snippet:

<CreditCard> 
     name="John Doe""
     ccnum=""1111123412341231"" 
     series="339"
     exp="03/13">
</CreditCard>

However, if I remove the ccnum or series, it doesn't match anything, and the series can be optional. Also the series can appear before or after the ccnum, so if I put the series attribute before the ccnum attribute, it doesn't match anything either. It also doesn't match if I have a series before a ccnum as separate elements, such as or if I disregard a series element:

<CreditCard> 
<series>234</series>
<ccnum>1235583839293838</ccnum>
</CreditCard>

I need the regex match the following scenarios, but I do not know the exact name of the elements, in this case, I just called them ccnum and series.

Here are the ones that work:

<CreditCard> 
            <ccnum>1235583839293838</ccnum>
            <series>123</series>
</CreditCard>

<CreditCard ccnum="1838383838383833"> 
            <series>123</series>
</CreditCard>

<CreditCard ccnum="1838383838383833" series="139"
</CreditCard>

It should also match the following, but does not:

<CreditCard ccnum="1838383838383833"
            </CreditCard>

<CreditCard series="139" ccnum="1838383838383833" 
            </CreditCard>

<CreditCard ccnum="1838383838383833"></CreditCard>

<CreditCard> 
    <series>123</series>                
    <ccnum>1235583839293838</ccnum>
</CreditCard>

<CreditCard>          
<ccnum series="123">1235583839293838</ccnum>
</CreditCard>

Right now, to get this to work, I am usinng 3 separate regular expressions:

1 to match a credit card number that comes before a security code.

1 to match a security code that comes before a credit card number.

1 to match just a credit card number.

I tried combining the expressions into an or, but I end up with 5 total groups (2 from the first 2 expressions and 1 from the last one)

3条回答
欢心
2楼-- · 2019-07-18 17:30
(?<=[>\"'](\\d{3,4})[<\"'].{0,100})?[>\"'](\\d{13,16})[<\"'](?=.*[>\"'](\\d{3,4})[<\"'])?

This will create three capture groups, where the ccnum is always in the second group, and the series can be in the first, the third, or none of the groups.

ccnum = match.Groups[2].Value;
series = match.Groups[1].Value + m.Groups[3].Value;
查看更多
forever°为你锁心
3楼-- · 2019-07-18 17:40

It is probably much easier to pull the XML into an XDocument using its Parse method. Then you can use XPath or other means of finding that data.

As for the regex: You regex is to complex for me to comprehend, but this is how you make a certain block optional: "(thisisoptional)?".

And you cannot account for the two different orders except by including both orders manually into the regex. So if you want to be able to match "ab" and "ba" (different order), you need the following regex: "((ab)|(ba))". So everything is twice in there. You can reduce the nastyness of this by factoring out "a" and "b" into a string variable each.

查看更多
神经病院院长
4楼-- · 2019-07-18 17:44

You could try recursively traversing the XML document and scraping every attribute and text node that matches your expression for ccnum and series and appending them to List<string> ccNumList and List<string> seriesList. If ccnum and series are in the same order in the DOM tree hierarchy then ccNumList[i] == seriesList[i].

An example of doing a recursive tree traversal is here.

查看更多
登录 后发表回答