I am not good with regex, but I have the following, but I assume part of the following means look for 13 - 16 digits and then return a success if it finds 3 - 4 digits after that. The problem is that the 3 - 4 digits are optional and they can also be before the 13 - 16 digit number, so I guess I want to combine a positive lookahead/lookbehind, negative lookahead/lookbehind. This sounds way to complex, is there a simpler way?
(\d{13,16})[<"'].*?(?=[>"']\d{3,4}[<"'])[>"'](\d{3,4})[<"']
which will match the ccnum and the series in the following snippet:
<CreditCard>
name="John Doe""
ccnum=""1111123412341231""
series="339"
exp="03/13">
</CreditCard>
However, if I remove the ccnum or series, it doesn't match anything, and the series can be optional. Also the series can appear before or after the ccnum, so if I put the series attribute before the ccnum attribute, it doesn't match anything either. It also doesn't match if I have a series before a ccnum as separate elements, such as or if I disregard a series element:
<CreditCard>
<series>234</series>
<ccnum>1235583839293838</ccnum>
</CreditCard>
I need the regex match the following scenarios, but I do not know the exact name of the elements, in this case, I just called them ccnum and series.
Here are the ones that work:
<CreditCard>
<ccnum>1235583839293838</ccnum>
<series>123</series>
</CreditCard>
<CreditCard ccnum="1838383838383833">
<series>123</series>
</CreditCard>
<CreditCard ccnum="1838383838383833" series="139"
</CreditCard>
It should also match the following, but does not:
<CreditCard ccnum="1838383838383833"
</CreditCard>
<CreditCard series="139" ccnum="1838383838383833"
</CreditCard>
<CreditCard ccnum="1838383838383833"></CreditCard>
<CreditCard>
<series>123</series>
<ccnum>1235583839293838</ccnum>
</CreditCard>
<CreditCard>
<ccnum series="123">1235583839293838</ccnum>
</CreditCard>
Right now, to get this to work, I am usinng 3 separate regular expressions:
1 to match a credit card number that comes before a security code.
1 to match a security code that comes before a credit card number.
1 to match just a credit card number.
I tried combining the expressions into an or, but I end up with 5 total groups (2 from the first 2 expressions and 1 from the last one)
This will create three capture groups, where the
ccnum
is always in the second group, and theseries
can be in the first, the third, or none of the groups.It is probably much easier to pull the XML into an XDocument using its Parse method. Then you can use XPath or other means of finding that data.
As for the regex: You regex is to complex for me to comprehend, but this is how you make a certain block optional: "(thisisoptional)?".
And you cannot account for the two different orders except by including both orders manually into the regex. So if you want to be able to match "ab" and "ba" (different order), you need the following regex: "((ab)|(ba))". So everything is twice in there. You can reduce the nastyness of this by factoring out "a" and "b" into a string variable each.
You could try recursively traversing the XML document and scraping every attribute and text node that matches your expression for
ccnum
andseries
and appending them toList<string> ccNumList
andList<string> seriesList
. Ifccnum
andseries
are in the same order in the DOM tree hierarchy thenccNumList[i] == seriesList[i]
.An example of doing a recursive tree traversal is here.