When the .NET System.Uri
class parses strings it performs some normalization on the input, such as lower-casing the scheme and hostname. It also trims trailing periods from each path segment. This latter feature is fatal to OpenID applications because some OpenIDs (like those issued from Yahoo) include base64 encoded path segments which may end with a period.
How can I disable this period-trimming behavior of the Uri class?
Registering my own scheme using UriParser.Register
with a parser initialized with GenericUriParserOptions.DontCompressPath
avoids the period trimming, and some other operations that are also undesirable for OpenID. But I cannot register a new parser for existing schemes like HTTP and HTTPS, which I must do for OpenIDs.
Another approach I tried was registering my own new scheme, and programming the custom parser to change the scheme back to the standard HTTP(s) schemes as part of parsing:
public class MyUriParser : GenericUriParser
{
private string actualScheme;
public MyUriParser(string actualScheme)
: base(GenericUriParserOptions.DontCompressPath)
{
this.actualScheme = actualScheme.ToLowerInvariant();
}
protected override string GetComponents(Uri uri, UriComponents components, UriFormat format)
{
string result = base.GetComponents(uri, components, format);
// Substitute our actual desired scheme in the string if it's in there.
if ((components & UriComponents.Scheme) != 0)
{
string registeredScheme = base.GetComponents(uri, UriComponents.Scheme, format);
result = this.actualScheme + result.Substring(registeredScheme.Length);
}
return result;
}
}
class Program
{
static void Main(string[] args)
{
UriParser.Register(new MyUriParser("http"), "httpx", 80);
UriParser.Register(new MyUriParser("https"), "httpsx", 443);
Uri z = new Uri("httpsx://me.yahoo.com/b./c.#adf");
var req = (HttpWebRequest)WebRequest.Create(z);
req.GetResponse();
}
}
This actually almost works. The Uri
instance reports https instead of httpsx everywhere -- except the Uri.Scheme property itself. That's a problem when you pass this Uri
instance to the HttpWebRequest
to send a request to this address. Apparently it checks the Scheme property and doesn't recognize it as 'https' because it just sends plaintext to the 443 port instead of SSL.
I'm happy for any solution that:
- Preserves trailing periods in path segments in
Uri.Path
- Includes these periods in outgoing HTTP requests.
- Ideally works with under ASP.NET medium trust (but not absolutely necessary).
Microsoft says it will be fixed in .NET 4.0 (though it appears from the comments that it has not been fixed yet)
https://connect.microsoft.com/VisualStudio/feedback/details/386695/system-uri-incorrectly-strips-trailing-dots?wa=wsignin1.0#tabs
There is a workaround on that page, however. It involves using reflection to change the options though, so it may not meet the medium trust requirement. Just scroll to the bottom and click on the "Workarounds" tab.
Thanks to jxdavis and Google for this answer:
http://social.msdn.microsoft.com/Forums/en-US/netfxbcl/thread/5206beca-071f-485d-a2bd-657d635239c9
I'm curious if part of the problem is that you are only accounting for "don't compress path", instead of all the defaults of the base HTTP parser: (including UnEscapeDotsAndSlashes)
private const UriSyntaxFlags HttpSyntaxFlags = (UriSyntaxFlags.AllowIriParsing | UriSyntaxFlags.AllowIdn | UriSyntaxFlags.UnEscapeDotsAndSlashes | UriSyntaxFlags.CanonicalizeAsFilePath | UriSyntaxFlags.CompressPath | UriSyntaxFlags.ConvertPathSlashes | UriSyntaxFlags.PathIsRooted | UriSyntaxFlags.AllowAnInternetHost | UriSyntaxFlags.AllowUncHost | UriSyntaxFlags.MayHaveFragment | UriSyntaxFlags.MayHaveQuery | UriSyntaxFlags.MayHavePath | UriSyntaxFlags.MayHavePort | UriSyntaxFlags.MayHaveUserInfo | UriSyntaxFlags.MustHaveAuthority);
That's as opposed to the news that has flags (for instance):
private const UriSyntaxFlags NewsSyntaxFlags = (UriSyntaxFlags.AllowIriParsing | UriSyntaxFlags.MayHaveFragment | UriSyntaxFlags.MayHavePath);
Dang, Brandon Black beat me to it while I was working on typing things up...
This may help with code readability:
namespace System
{
[Flags]
internal enum UriSyntaxFlags
{
AllowAnInternetHost = 0xe00,
AllowAnyOtherHost = 0x1000,
AllowDnsHost = 0x200,
AllowDOSPath = 0x100000,
AllowEmptyHost = 0x80,
AllowIdn = 0x4000000,
AllowIPv4Host = 0x400,
AllowIPv6Host = 0x800,
AllowIriParsing = 0x10000000,
AllowUncHost = 0x100,
BuiltInSyntax = 0x40000,
CanonicalizeAsFilePath = 0x1000000,
CompressPath = 0x800000,
ConvertPathSlashes = 0x400000,
FileLikeUri = 0x2000,
MailToLikeUri = 0x4000,
MayHaveFragment = 0x40,
MayHavePath = 0x10,
MayHavePort = 8,
MayHaveQuery = 0x20,
MayHaveUserInfo = 4,
MustHaveAuthority = 1,
OptionalAuthority = 2,
ParserSchemeOnly = 0x80000,
PathIsRooted = 0x200000,
SimpleUserSyntax = 0x20000,
UnEscapeDotsAndSlashes = 0x2000000,
V1_UnknownUri = 0x10000
}
}
You should be able to precent escape the '.' using '%2E', but that's the cheap and dirty way out.
You might try playing around with the dontEscape option a bit and it may change how Uri is treating those characters.
More info here:
http://msdn.microsoft.com/en-us/library/system.uri.aspx
Also check out the following (see DontUnescapePathDotsAndSlashes):
http:// msdn.microsoft.com/en-us/library/system.genericuriparseroptions.aspx
Does this work?
public class MyUriParser : UriParser
{
private string actualScheme;
public MyUriParser(string actualScheme)
{
Type type = this.GetType();
FieldInfo fInfo = type.BaseType.GetField("m_Flags", BindingFlags.Instance | BindingFlags.NonPublic);
fInfo.SetValue(this, GenericUriParserOptions.DontCompressPath);
this.actualScheme = actualScheme.ToLowerInvariant();
}
protected override string GetComponents(Uri uri, UriComponents components, UriFormat format)
{
string result = base.GetComponents(uri, components, format);
// Substitute our actual desired scheme in the string if it's in there.
if ((components & UriComponents.Scheme) != 0)
{
string registeredScheme = base.GetComponents(uri, UriComponents.Scheme, format);
result = this.actualScheme + result.Substring(registeredScheme.Length);
}
return result;
}}