I understand why this is happening but I need a work-around. I looked into some other questions on StackOverflow but none of them was helpful. I do not want disable input validation throughout the whole website because that is definitely dangerous. I have only one (at least for now) place where I need to disable input validation.
I decorated the Action Method with [ValidateInput(false)] attribute, and I'm encoding the strings with Html.Encode. But still, I get the same error. Here's my view:
<div id="sharwe-categories">
<ul class="menu menu-vertical menu-accordion">
@foreach(var topLevel in Model)
{
var topLevelName = Html.Encode(topLevel.Name);
<li class="topLevel">
<h3>
@Html.ActionLink(topLevel.Name, "Index", "Item", new { category = topLevelName }, new {@class = "main"} )
<a href="#" class="drop-down"></a>
</h3>
<ul>
@foreach (var childCategory in topLevel.Children)
{
var childcategoryName = Html.Encode(childCategory.Name);
<li>@Html.ActionLink(childCategory.Name, "Index", "Item", new RouteValueDictionary { { "category", topLevelName }, { "subcategory", childcategoryName } }, null)</li>
}
</ul>
</li>
}
</ul>
</div>
As you can see, there's no user input. But some of the category names have some "dangerous" characters in them... Any solutions?
Although Darin's answer is perfectly feasible, I wouldn't recommend using Scott Hanselman's technique of turning all this validation off step by step. You will sooner or later end up in deep...
Second suggestion of using IDs along with dummy strings (which are great for SEO and people) is a way to go, but sometimes they're not feasible either. Imagine this request URL:
/111/Electronics/222/Computers/333/Apple
Although we'd have these IDs we can rely on and human/SEO friendly category names as well, this is definitely not desired. ID + Dummy string is feasible when we need to represent one single item. In other cases it's not. And since you have to display category and subcategory, this is a problem.
So what can you do?
Two possible solutions:
Cleanup category names to only have valid characters - this can be done but if these are not static and editable by privileged users, you're out of luck here, because even if you've cleaned them up now, someone will enter something invalid later
Cleanup your string on the go - When you use category name, clean it up and when reading it and using it (to get the actual category ID) you can compare provided (previously cleaned) category name with value in DB that you clean up on the fly either:
- now while filtering categories
- before when generating category names
I'd suggest you take the 2.2 approach. Extend your DB table to have two columns:
- Category display name
- Category URL friendly name
you can also set a unique constraint on the second column, so it won't happen that two of your categories (even though they'd have different display names) would have same URL friendly names.
How to clean
The first thing that comes to mind is to strip out invalid characters, but that's very tedious and you'll most probably leave something out. It's much easier and wiser to get valid characters from your category display name. I've done the same when generating dummy URL category names. Just take out what is valid and bin the rest. It usually works just fine. Two examples of such regular expressions:
(\w{2,})
- only use letters, digits and underscores and at least two of them (so we leave out a or single numbers and similar that doesn't add any meaning and unnecessarily lengthens our URL
([a-zA-Z0-9]{2,})
- only letters and digits (also 2+)
Get all matches in your category display name and join them with a space/dash and save along with original display name. A different question of mine was exactly about this.
Why an additional column? Because you can't run regular expression in SQL server. If you're using MySql you can use one column and use regular expression on DB as well.
Even though you shouldn't do it this way...sometimes there is not an easy way to get around it. requestPathInvalidCharacters on the httpRuntime tag in web.Config is what you seek. Just enter the following into the <system.web> section:
<httpRuntime requestPathInvalidCharacters="<,>,*,%,:,\" />
I would highly encourage locking this down using the following instead:
<location path="the/path/you/need/to/lock/down">
<system.web>
<httpRuntime requestPathInvalidCharacters="<,>,*,%,:,\"/>
</system.web>
</location>
Just toss that into the root <configuration> tag. That way...you're not opening your entire site to allow for ampersand's in the path...and potentially exposing the entire site to an unforseen attack.
You may find the following blog post useful about using special characters in urls. But in general it is best practice to replace those titles with slugs the same as StackOverflow does with question titles in the url for better SEO and use ids to identify them.