I wan a regex to alidate all types of possible DN's
I create one but its not so good.
/([A-z0-9=]{1}[A-z0-9]{1})*[,??]/
and some others by changing it, but in vain.
Posible DN's can be
CN=abcd,CN=abcd,O=abcd,C=us
CN=abcd0520,CN=users,O=abcd,C=us
C=us
etc
I recently had a need for this, so I created one that perfectly follows the LDAPv3 distinguished name syntax at RFC-2253.
Attribute Type
An attributeType can be expressed 2 ways. An alphanumeric string that starts with an alpha, validated using:
[A-Za-z][\w-]*
Or it can be an OID, validated using:
\d+(?:\.\d+)*
So attributeType validates using:
[A-Za-z][\w-]*|\d+(?:\.\d+)*
Attribute Value
An attributeValue can be expressed 3 ways. A hex string, which is a sequence of hex-pairs with a leading #
. A hex string validates using:
#(?:[\dA-Fa-f]{2})+
Or an escaped string; each non-special character is expressed "as-is" (validates using [^,=\+<>#;\\"]
). Special characters can be expressed with a leading \
(validates using \\[,=\+<>#;\\"]
). Finally any character can be expressed as a hex-pair with a leading \
(validates using \\[\dA-Fa-f]{2}
). An escaped string validates using:
(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*
Or a quoted-string; the value starts and ends with "
, and can contain any character un-escaped except \
and "
. Additionally, any of the methods from the escaped string above can be used. A quoted-string validates using:
"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"
All combined, an attributeValue validates using:
#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"
Name component
A name-component in BNF is:
name-component = attributeTypeAndValue *("+" attributeTypeAndValue)
attributeTypeAndValue = attributeType "=" attributeValue
In RegEx is:
(?#attributeType)=(?#attributeValue)(?:\+(?#attributeType)=(?#attributeValue))*
Replacing the (?#attributeType)
and (?#attributeValue)
placeholders with the values above gives us:
(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*")(?:\+(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"))*
Which validates a single name-component.
Distinguished name
Finally, the BNF for a distinguished name is:
name-component *("," name-component)
In RegEx is:
(?#name-component)(?:,(?#name-component))*
Replacing the (?#name-component) placeholder with the value above gives us:
^(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*")(?:\+(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"))*(?:,(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*")(?:\+(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"))*)*$
Test it here
This is not only not possible, it will never work, and should not even be attempted. LDAP data (distinguished name in this case) are not strings. A distinguished name has distinguishedName
syntax, which is not a string, and comparisons must be made with using matching rules defined in the directory server schema. For this reason, regular expressions and native-language comparison, relative value, and equality operations like perl's ~~
, eq
and ==
and Java's ==
cannot be used with LDAP data - if a programmer attempts this, unexpected results can occur and the code is brittle, fragile, unpredictable, and does not have repeatable characteristics. Language LDAP APIs that do not support matching rules cannot be used with LDAP where comparison, equality checks, and relative value ordering comparisons are required.
By way of example, the distinguished names "dc=example,dc=com
" and "DC=example, DC=COM
" are equivalent in every way from an LDAP perspective, but native language equality operators would return false
.
I created one. Working great.
^(\w+[=]{1}\w+)([,{1}]\w+[=]{1}\w+)*$
This worked for me:
Expression:
^(?<RDN>(?<Key>(?:\\[0-9A-Fa-f]{2}|\\\[^=\,\\]|[^=\,\\]+)+)\=(?<Value>(?:\\[0-9A-Fa-f]{2}|\\\[^=\,\\]|[^=\,\\]+)+))(?:\s*\,\s*(?<RDN>(?<Key>(?:\\[0-9A-Fa-f]{2}|\\\[^=\,\\]|[^=\,\\]+)+)\=(?<Value>(?:\\[0-9A-Fa-f]{2}|\\\[^=\,\\]|[^=\,\\]+)+)))*$
Test:
CN=Test User Delete\0ADEL:c1104f63-0389-4d25-8e03-822a5c3616bc,CN=Deleted Objects,DC=test,DC=domain,DC=local
The expression is already Regex escaped so to avoid having to repeat all the backslashes in C# make sure you prefix the string with the non-escaped literal @ sign, i.e.
var dnExpression = @"...";
This will yield four groups, first a copy of the whole string, second a copy of the last RDN, third and fourth the key/value pairs. You can index into each key/value using the Captures collection of each group.
You can also use this to validate a RDN by cutting the expression to the "(?...)" group surrounded by the usual "^...$" to required a whole value (start-end of string).
I've allowed a hex special character escape "\", simple character escape "\" or anything other than ",=\" inside the key/value DN text. I'd guess this expression could be perfected by taking extra time to go through the MSDN AD standard and restrict the allowed characters to match exactly what is or is not allowed. But I believe this is a good start.