I need to validate the date range in an element in a BizTalk schema. I have received dates with start dates before end dates (20130521-20130501). I know I could easily parse and validate this string using XSLT and C# in a map, however, I need the validation to be part of the schema so that if a transaction with a date in this format is received, BizTalk will reject the EDI transaction and produce a 999 rejection back to the sender.
I have read numerous posts that advise against using regex validation for date ranges but aside from creating a custom pipeline component or using C#, I don't see any other way. There is a schema property that allows regular expressions to be used to validate input data.
I am not very good with regular expressions at all and need some help figuring out how to validate that the begin date is less than or equal to the end date. The dates are received as strings. I have read about splitting the strings on the "-" but don't know how to compare the results. Any help would be appreciated.
Rather than writing your own custom component to do the validation you could use the BizTalk Business Rules Engine Pipeline Framework in conjunction with a BRE Policy to validate the date range.
Full Disclosure: This framework is written by a colleague of mine.
The problem
It seems that you are unaware of the limits of regular expressions, but that's ok.
What the question really comes to is the following solution: check if
x =< y
and match it.The limit
Why ? Well you want to check if
start date =< end date
. The idea of regular expressions is to match certain characters following a certain regular pattern. Regex alone can't check ifx < y
since regex doesn't have logical operators>
<
.Bypassing certain limit
What regex could do is to check if
x = y
. Say for example I've the following string and I want to get all the lines wherex = y
:We could use the following regex:
^(\d+)\s*=\s*\1$
with them
modifier. What does this mean ?^
: start of line(\d+)
: group and match any digit one or more times\s*=\s*
: match white space 0 or more times, and then=
and then any white space 0 or more times\1
: referring to group 1, so match only if it's the same as what was matched in group 1.$
: end of linem
modifier : multi-line. Make^
and$
match start and end of line respectivelyOnline demo.
Proof of concept
Let's hack further. For this POC, we are going to match the following:
x-y
where0 =< x =< 9
and0 =< y =< 9
andx =< y
.What we can do is trying to match all possibilites where
x =< y
. So ifx=0
theny=[0-9]
, ifx=1
theny=[1-9]
, ifx=2
theny=[2-9]
and so forth. Since regex has theor
statement, we could write the following regex:0-[0-9]|1-[1-9]|2-[2-9]|3-[3-9]|4-[4-9]|5-[5-9]|6-[6-9]|7-[7-9]|8-[8-9]|9-9
Online demo
You see ? It's actually so long for a simple comparison ! That's why any sane person would parse and validate it with built in language tools.
Breaking the laws of regex
We're going to use PHP to generate a regex expression:
The above code will generate a regex that can validate a date between
2013-01-01
and2013-03-01
wherex =< y
in the form ofx-y
. This regex is not optimised and is about 17KB. So imagine the size of this regex if I configured it to validate a range of 10 years ? Note that the size grows exponentially. I tried with an interval of 4 months but I got an error/warning saying that the expression is too long.Since the regex is too long, I can't make an online demo of it, but here's the code in PHP:
Output:
Online dump of the regex | Online PHP demo
Conclusion
Please don't ever think about using regex for this task, otherwise you would have 10 problems :)