I have a .Net Core (C#) project with the following line in one of the classes:
var input = "£";
But when I do a git clone in a Docker container (microsoft/dotnet:2.2-sdk
) it messes it up and displays it as �
(in bash
using cat
).
And when I run it, its Utf-8
bytes are [239, 191, 189] = [EF, BF, BD]
which seem to be a so-called Unicode replacement character.
Windows editor that I use is VS 2017, but character is displayed properly on other windows machines and parsed properly by dotnet run/test
command, so I don't think this is a problem of failing to save the character incorrectly.
Any ideas why I am seeing such a mess and how to solve it?
Some details
- I get bytes using
Encoding.UTF8.GetBytes("£");
- It works perfectly well on
Windows 10
machine - Linux version
Debian GNU/Linux 9 (stretch)
from thecat /etc/os-release
locale -a
returnsC
C.UTF-8
POSIX
- On Windows Notepad++, when opened, is claims to be ANSI and is displayed correctly.
Running fgrep 'var input' file.cs | od -tx1 -c
0000100 76 61 72 20 69 6e 70 75 74 20 3d 20 22 a3 22 3b
v a r i n p u t = " 243 " ;
Your file contains a single byte
a3
which corresponds to the Windows-1252 encoding for the character£
. Your Linux system displays�
because it is not a valid UTF-8 encoding.You should configure Visual Studio to use UTF-8 instead of Windows-1252.