Character encoding errors with .NET Core on Linux

2019-08-01 09:01发布

This has been driving me batty for days, and I've finally got it down to a simple, reproducible issue.

I have a NUnit test project, which is .NET Core 2.1. It references a library (let's call it "Core") which is .NET Standard 2.0.

In my test project:

[TestCase(true, false)]
[TestCase(false, false)]
[TestCase(false, true)]
public void ShouldStartWith(bool useInternal, bool passStartsWith)
{
    var result = useInternal ? StartsWithQ("¿Que?") : StringUtilities.StartsWithQ("¿Que?", passStartsWith ? "¿" : null);
    result.ShouldBeTrue();
}

public static bool StartsWithQ(string s)
{
    return _q.Any(q => s.StartsWith(q, StringComparison.InvariantCultureIgnoreCase));
}

and in the Core project in the StringUtilities class:

public static bool StartsWithQ(string s, string startsWith = null)
{
    return startsWith == null
        ? _q.Any(q => s.StartsWith(q, StringComparison.InvariantCultureIgnoreCase))
        : s.StartsWith(startsWith, StringComparison.InvariantCultureIgnoreCase);
}

Both classes have defined a list of special characters:

private static readonly List<string> _q = new List<string>
{
    "¡",
    "¿"
};

In a Windows environment, all test cases pass. But when the same tests run in the Linux environment, the test case ShouldStartWith(False,False) fails!

That means that when everything is running in the test project, the string comparison works correctly, and even if you pass the special characters to the StringUtilities method, the comparison works. But when you compare to a string that was compiled in the Core project, the special characters are no longer equivalent!

Anyone know why this is? Is this a .NET bug? How to work around it?

1条回答
孤傲高冷的网名
2楼-- · 2019-08-01 09:36

The encodings of your source files most likely don't match each other and/or not the compiler settings.

Example:

The sourcefile containing public void ShouldStartWith(bool useInternal, bool passStartsWith) may be encoded using utf-8 while the source file with the list is encoded in Latin-1 (or something like that).

When we play this through:

  • The utf-8 representation of ¿ would be: 0xC2 0xBF.
  • The Latin-1 representation of ¿ would be: 0xBF.

Thus, when the compiler interprets your source files as Latin-1, then he will read 2 bytes in the case of the utf-8 saved file (and according to Latin-1 also 2 chars) and therefore fails to match the strings.

As already stated in the comments: The best way to overcome this is to encode the source files in the encoding the compiler awaits.

Another way to exclude the operating system as error source: Copy the compiled project (the dll's - don't recompile the source on the other operating system) from one operating system to the other and execute the code. You should see the same behaviour on both operating systems with the same binary compiler output.

查看更多
登录 后发表回答