Git messes up with non-ascii characters on Linux c

2019-08-17 16:56发布

I have a .Net Core (C#) project with the following line in one of the classes:

var input = "£";

But when I do a git clone in a Docker container (microsoft/dotnet:2.2-sdk) it messes it up and displays it as � (in bash using cat).

And when I run it, its Utf-8 bytes are [239, 191, 189] = [EF, BF, BD] which seem to be a so-called Unicode replacement character.

Windows editor that I use is VS 2017, but character is displayed properly on other windows machines and parsed properly by dotnet run/test command, so I don't think this is a problem of failing to save the character incorrectly.

Any ideas why I am seeing such a mess and how to solve it?

Some details

I get bytes using Encoding.UTF8.GetBytes("£");
It works perfectly well on Windows 10 machine
Linux version Debian GNU/Linux 9 (stretch) from the cat /etc/os-release
locale -a returns C C.UTF-8 POSIX
On Windows Notepad++, when opened, is claims to be ANSI and is displayed correctly.

Running fgrep 'var input' file.cs | od -tx1 -c

0000100  76  61  72  20  69  6e  70  75  74  20  3d  20  22  a3  22  3b
          v   a   r       i   n   p   u   t       =       " 243   "   ;

标签： linux git character-encoding

1条回答

小情绪 Triste *

2楼-- · 2019-08-17 17:48

Your file contains a single byte a3 which corresponds to the Windows-1252 encoding for the character £. Your Linux system displays � because it is not a valid UTF-8 encoding.

You should configure Visual Studio to use UTF-8 instead of Windows-1252.

0人赞添加讨论(0) 举报

Git messes up with non-ascii characters on Linux c

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间