What is the regular expression that Microsoft's .NET Framework uses to perform the standard validation that results in HttpRequestValidationException "A potentially dangerous Request.Form value was detected from the client" when HTML or other potentially unsafe content is posted.
I'd like to have an exact copy of it converted to JavaScript so the user can be alerted early.
My current regular expression (/(&#)|<[^<>]+>/) is close, but not the same as .NET's.
I'm aware this might be different for different .NET versions so specifically I'd like to know:
- A regular expression for .NET 2
- A regular expression for .NET 4
You can use some decompilig tool and see for yourself that there no regular expression at all. It calls static method CrossSiteScriptingValidation.IsDangerousString
.
But maybe you can use the Microsoft AntiXSS library to achive the same. Anyway here is the method:
internal static bool IsDangerousString(string s, out int matchIndex)
{
matchIndex = 0;
int num1 = 0;
int num2 = s.IndexOfAny(CrossSiteScriptingValidation.startingChars, num1);
if (num2 < 0)
{
return false;
}
if (num2 == s.Length - 1)
{
return false;
}
matchIndex = num2;
char chars = s.get_Chars(num2);
if ((chars == 38 || chars == 60) && (CrossSiteScriptingValidation.IsAtoZ(s.get_Chars(num2 + 1)) || s.get_Chars(num2 + 1) == 33 || s.get_Chars(num2 + 1) == 47 || s.get_Chars(num2 + 1) == 63))
{
return true;
}
else
{
if (s.get_Chars(num2 + 1) == 35)
{
return true;
}
}
num1 = num2 + 1;
}
I might have answered this in another question here:
https://stackoverflow.com/a/4949339/62054
This regex follows the logic in .NET 4.
/^(?!(.|\n)*<[a-z!\/?])(?!(.|\n)*&#)(.|\n)*$/i
Look in the .NET source for CrossSiteScriptingValidation to find the logic that Microsoft follow. fge is right, it doesn't use a regex, instead it uses some loops and string comparisons. I suspect that's for performance.