How do you correctly escape a document name in .NE

2019-04-21 13:22发布

问题:

We store a bunch of weird document names on our web server (people upload them) that have various characters like spaces, ampersands, etc. When we generate links to these documents, we need to escape them so the server can look up the file by its raw name in the database. However, none of the built in .NET escape functions will work correctly in all cases.

Take the document Hello#There.docx:

UrlEncode will handle this correctly:

HttpUtility.UrlEncode("Hello#There");
"Hello%23There"

However, UrlEncode will not handle Hello There.docx correctly:

HttpUtility.UrlEncode("Hello There.docx");
"Hello+There.docx"

The + symbol is only valid for URL parameters, not document names. Interestingly enough, this actually works on the Visual Studio test web server but not on IIS.

The UrlPathEncode function works fine for spaces:

HttpUtility.UrlPathEncode("Hello There.docx");
"Hello%20There.docx"

However, it will not escape other characters such as the # character:

HttpUtility.UrlPathEncode("Hello#There.docx");
"Hello#There.docx"

This link is invalid as the # is interpreted as a URL hash and never even gets to the server.

Is there a .NET utility method to escape all non-alphanumeric characters in a document name, or would I have to write my own?

回答1:

Have a look at the Uri.EscapeDataString Method:

Uri.EscapeDataString("Hello There.docx")  // "Hello%20There.docx"

Uri.EscapeDataString("Hello#There.docx")  // "Hello%23There.docx"


回答2:

I would approach it a different way: Do not use the document name as key in your look-up - use a Guid or some other id parameter that you can map to the document name on disk in your database. Not only would that guarantee uniqueness but you also would not have this problem of escaping in the first place.



回答3:

You can use @ character to escape strings. See the below pieces of code.

string str = @"\n\n\n\n";
 Console.WriteLine(str);

Output: \n\n\n\n

string str1 = @"\df\%%^\^\)\t%%";
Console.WriteLine(str1);

Output: \df\%%^\^)\t%%

This kind of formatting is very useful for pathnames and for creating regexes.