StartsWith change in Windows Server 2012

2019-01-11 14:15发布

问题:

Edit: I originally thought this was related to .NET Framework 4.5. Turned out it applies to .NET Framework 4.0 as well.

There's a change in how strings are handled in Windows Server 2012 which I'm trying to understand better. It seems like the behavior of StartsWith has changed. The issue is reproducible using both .NET Framework 4.0 and 4.5.

With .NET Framework 4.5 on Windows 7, the program below prints "False, t". On Windows 2012 Server, it prints "True, t" instead.

internal class Program
{
   private static void Main(string[] args)
   {
      string byteOrderMark = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
      Console.WriteLine("test".StartsWith(byteOrderMark));
      Console.WriteLine("test"[0]);
   }
}

In other words, StartsWith(ByteOrderMark) returns true regardless of string content. If you have code which attempts to strip away the byte order mark using the following method, this code will work fine with on Windows 7 but will print "est" on Windows 2012.

internal class Program
{
  private static void Main(string[] args)
  {
     string byteOrderMark = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
     string someString = "Test";

     if (someString.StartsWith(byteOrderMark))
        someString = someString.Substring(1);

     Console.WriteLine("{0}", someString);
     Console.ReadKey();

  }

}

I realize that you have already done something wrong if you have byte order markers in a string, but we're integrating with legacy code which has this. I know I can solve this specific issue by doing something like below, but I want to understand the problem better.

someString = someString.Trim(byteOrderMark[0]);

Hans Passsant suggested using the constructor of UTF8Encoding which lets me tell it explicitly to emit UTF8 identifier. I tried this, but it gives the same result. The below code differs in output between Windows 7 and Windows Server 2012. On Windows 7, it prints "Result: False". On Windows Server 2012 it prints "Result: True".

  private static void Main(string[] args)
  {
     var encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: true);
     string byteOrderMark = encoding.GetString(encoding.GetPreamble());
     Console.WriteLine("Result: " + "Hello".StartsWith(byteOrderMark));
     Console.ReadKey();
  }

I've also tried the following variant, which prints False, False, False on Windows 7 but True, True, False on Windows Server 2012, which confirms it's related to the implementation of StartsWith on Windows Server 2012.

  private static void Main(string[] args)
  {
     var encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: true);
     string byteOrderMark = encoding.GetString(encoding.GetPreamble());
     Console.WriteLine("Hello".StartsWith(byteOrderMark));
     Console.WriteLine("Hello".StartsWith('\ufeff'.ToString()));
     Console.WriteLine("Hello"[0] == '\ufeff');

     Console.ReadKey();
  }

回答1:

Turns out I could repro this, running the test program on Windows 8.1. It is in the same "family" as Server 2012.

The most likely source of the problem is that the culture sensitive comparison rules have changed. They can be, erm, flaky and can have odd outcomes on these kind of characters. The BOM is a zero-width space. Reasoning this out requires the same kind of mental gymnastics as understanding why "abc".StartsWith("") returns true :)

You need to solve your problem by using StringComparison.Ordinal. This produced False, False, False:

private static void Main(string[] args) {
    var encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: true);
    string byteOrderMark = encoding.GetString(encoding.GetPreamble());
    Console.WriteLine("Hello".StartsWith(byteOrderMark, StringComparison.Ordinal));
    Console.WriteLine("Hello".StartsWith("\ufeff", StringComparison.Ordinal));
    Console.WriteLine("Hello"[0] == '\ufeff');
    Console.ReadKey();
}