Git messes up with non-ascii characters on Linux c

2019-08-17 16:56发布

I have a .Net Core (C#) project with the following line in one of the classes:

var input = "£";

But when I do a git clone in a Docker container (microsoft/dotnet:2.2-sdk) it messes it up and displays it as (in bash using cat).

And when I run it, its Utf-8 bytes are [239, 191, 189] = [EF, BF, BD] which seem to be a so-called Unicode replacement character.

Windows editor that I use is VS 2017, but character is displayed properly on other windows machines and parsed properly by dotnet run/test command, so I don't think this is a problem of failing to save the character incorrectly.

Any ideas why I am seeing such a mess and how to solve it?

Some details

  • I get bytes using Encoding.UTF8.GetBytes("£");
  • It works perfectly well on Windows 10 machine
  • Linux version Debian GNU/Linux 9 (stretch) from the cat /etc/os-release
  • locale -a returns C C.UTF-8 POSIX
  • On Windows Notepad++, when opened, is claims to be ANSI and is displayed correctly.

Running fgrep 'var input' file.cs | od -tx1 -c

0000100  76  61  72  20  69  6e  70  75  74  20  3d  20  22  a3  22  3b
          v   a   r       i   n   p   u   t       =       " 243   "   ;

1条回答
小情绪 Triste *
2楼-- · 2019-08-17 17:48

Your file contains a single byte a3 which corresponds to the Windows-1252 encoding for the character £. Your Linux system displays because it is not a valid UTF-8 encoding.

You should configure Visual Studio to use UTF-8 instead of Windows-1252.

查看更多
登录 后发表回答