I have an application which sends a POST request to the VB forum software and logs someone in (without setting cookies or anything).
Once the user is logged in I create a variable that creates a path on their local machine.
c:\tempfolder\date\username
The problem is that some usernames are throwing "Illegal chars" exception. For example if my username was mas|fenix
it would throw an exception..
Path.Combine( _
Environment.GetFolderPath(System.Environment.SpecialFolder.CommonApplicationData), _
DateTime.Now.ToString("ddMMyyhhmm") + "-" + form1.username)
I don't want to remove it from the string, but a folder with their username is created through FTP on a server. And this leads to my second question. If I am creating a folder on the server can I leave the "illegal chars" in? I only ask this because the server is Linux based, and I am not sure if Linux accepts it or not.
EDIT: It seems that URL encode is NOT what I want.. Here's what I want to do:
old username = mas|fenix
new username = mas%xxfenix
Where %xx is the ASCII value or any other value that would easily identify the character.
I've been experimenting with the various methods .NET provide for URL encoding. Perhaps the following table will be useful (as output from a test app I wrote):
The columns represent encodings as follows:
UrlEncoded:
HttpUtility.UrlEncode
UrlEncodedUnicode:
HttpUtility.UrlEncodeUnicode
UrlPathEncoded:
HttpUtility.UrlPathEncode
EscapedDataString:
Uri.EscapeDataString
EscapedUriString:
Uri.EscapeUriString
HtmlEncoded:
HttpUtility.HtmlEncode
HtmlAttributeEncoded:
HttpUtility.HtmlAttributeEncode
HexEscaped:
Uri.HexEscape
NOTES:
HexEscape
can only handle the first 255 characters. Therefore it throws anArgumentOutOfRange
exception for the Latin A-Extended characters (eg Ā).This table was generated in .NET 4.0 (see Levi Botelho's comment below that says the encoding in .NET 4.5 is slightly different).
EDIT:
I've added a second table with the encodings for .NET 4.5. See this answer: https://stackoverflow.com/a/21771206/216440
EDIT 2:
Since people seem to appreciate these tables, I thought you might like the source code that generates the table, so you can play around yourselves. It's a simple C# console application, which can target either .NET 4.0 or 4.5:
Ideally these would go in a class called "FileNaming" or maybe just rename Encode to "FileNameEncode". Note: these are not designed to handle Full Paths, just the folder and/or file names. Ideally you would Split("/") your full path first and then check the pieces. And obviously instead of a union, you could just add the "%" character to the list of chars not allowed in Windows, but I think it's more helpful/readable/factual this way. Decode() is exactly the same but switches the Replace(Uri.HexEscape(s[0]), s) "escaped" with the character.
Thanks @simon-tewsi for the very usefull table above!
You should encode only the user name or other part of the URL that could be invalid. URL encoding a URL can lead to problems since something like this:
Will yield
This is obviously not going to work well. Instead, you should encode ONLY the value of the key/value pair in the query string, like this:
Hopefully that helps. Also, as teedyay mentioned, you'll still need to make sure illegal file-name characters are removed or else the file system won't like the path.
Since .NET Framework 4.5 you can use
WebUtility.UrlEncode
.First, it resides in
System.dll
, so it does not require any additional references.Second, it properly escapes characters for URLs, unlike
Uri.EscapeUriString
(see comments to drweb86's answer).Third, it does not have any limits on the length of the string, unlike
Uri.EscapeDataString
(see related question), so it can be used for POST requests, for example.Fourth, it is available on WinRT, unlike
HttpUtility
(see related question).Levi Botelho commented that the table of encodings that was previously generated is no longer accurate for .NET 4.5, since the encodings changed slightly between .NET 4.0 and 4.5. So I've regenerated the table for .NET 4.5:
The columns represent encodings as follows:
HttpUtility.UrlEncode
HttpUtility.UrlEncodeUnicode
HttpUtility.UrlPathEncode
WebUtility.UrlEncode
Uri.EscapeDataString
Uri.EscapeUriString
HttpUtility.HtmlEncode
HttpUtility.HtmlAttributeEncode
WebUtility.HtmlEncode
Uri.HexEscape
NOTES:
HexEscape can only handle the first 255 characters. Therefore it throws an ArgumentOutOfRange exception for the Latin A-Extended characters (eg Ā).
This table was generated in .NET 4.5 (see answer https://stackoverflow.com/a/11236038/216440 for the encodings relevant to .NET 4.0 and below).
EDIT:
The .NET implementation of
UrlEncode
does not comply with RFC 3986.Some characters are not encoded but should be. The
!()*
characters are listed in the RFC's section 2.2 as a reserved characters that must be encoded yet .NET fails to encode these characters.Some characters are encoded but should not be. The
.-_
characters are not listed in the RFC's section 2.2 as a reserved character that should not be encoded yet .NET erroneously encodes these characters.The RFC specifies that to be consistent, implementations should use upper-case HEXDIG, where .NET produces lower-case HEXDIG.