Why are Danish characters not displayed as in text

2020-01-26 12:17发布

问题:

I make a simple batch file, but Windows command processor cmd.exe does not display Danish characters correct when I execute the batch file. It shows weird characters like ├ª├©├Ñ instead ÆØÅ. If I type echo æøå directly in cmd window, it shows æøå.

Is there something wrong with my computer?

回答1:

Use chcp to manage your code page.

Like Mofi said, specifying the following would help your case:

chcp 1252

Use this line of code before you print echo æøå.



回答2:

Everything on a computer is stored with a sequence of zeros and ones including characters. Which sequence of zeros and ones is displayed as æøå depends on rules.

The first rule is that a file with the extension bat or cmd contains text data interpreted by Windows command interpreter while a file with extension png contains image data according to PNG specification interpreted by image viewers/editors and so on.

The second rule is that a batch file contains text data being encoded with 1 byte (= 8 bits) per character and not 2 bytes as UTF-16 text encoding uses (for the mainly used characters, 4 bytes for rarely used symbols) or 1 to 4 bytes as UTF-8 text encoding uses (since November 2003).

The problem with 1 byte per character is that just 2^8 = 256 characters can be encoded, but there are much more characters used by humans.

The solution is using a code page. A code page defines which character is represented for example by a byte with the value

  • decimal: 248
  • hexadecimal: F8
  • binary: 1111 1000

The command CHCP (change code page) executed in a console window without any parameter outputs which code page is used on reading bytes being interpreted as characters by Windows command interpreter and how to output them.

The code page depends on Windows Region and Language settings set for the user account used for running a batch file in a console window.

The default code page on console is OEM 850 for Western European countries and OEM 865 for Nordic languages like Danish except Icelandic which uses OEM 861.

But the default code page for non Unicode encoded text files is Windows-1252 in GUI applications for Western European countries including Denmark.

How can the line echo æøå be encoded in a *.bat file?

  1. Using code page Windows-1252 and 1 byte per character.
    hexadecimal: 65 63 68 6F 20 E6 F8 E5
  2. Using code page OEM 865 or OEM 850 and 1 byte per character.
    hexadecimal: 65 63 68 6F 20 91 9B 86
  3. Using UTF-8 encoding without byte order mark (BOM) with 1 or 2 bytes per character.
    hexadecimal: 65 63 68 6F 20 C3 A6 C3 B8 C3 A5
  4. Using UTF-16 little endian encoding with byte order mark (BOM) with 2 bytes per character.
    hexadecimal: FF FE 65 00 63 00 68 00 6F 00 20 00 E6 00 F8 00 E5 00
  5. And many others.

Output of ├ª├©├Ñ on running the batch file is an indication for batch file being UTF-8 encoded as those 6 OEM 865 interpreted characters have the code values C3 A6 C3 B8 C3 A5.

So the batch file first needs to be converted from Unicode with UTF-8 encoding to ANSI. I write ANSI although Windows-1252 is not a standard defined by ANSI - American National Standards Institute because the term ANSI is used on Windows for 1 byte per character encoding. The result is a batch file with E6 F8 E5 for the three Danish characters.

The Windows-1252 encoded batch file displays on execution µ°Õ.

So the batch file needs to be converted a second time from ANSI to OEM, i.e. from Windows-1252 to OEM 865 or OEM 850. The three Danish characters are now encoded in the text file with 91 9B 86, but displayed with using code page Windows-1252 in a graphic user interface application (GUI text editor) as ‘›†.

However, now the batch file prints on execution æøå into the console window on my computer using code page 850 for console because of German configured in Windows Region and Language settings.

Another solution is encoding the batch file in Windows-1252 and use in batch file the following command line before output the text with ECHO:

chcp 1252 >nul

But this solution does not work if in properties for console windows a font is selected which does not support Windows-1252. For example if on tab Font of the Properties window of the console window Raster Fonts is selected and Windows (7, Vista, XP) selected Terminal as raster font to use for the console, changing code page to 1252 has no effect because the font displays on Windows-1252 encoded echo æøå still µ°Õ although active code page is 1252. In other words the selected font for console windows must support also the active code page to get the display of the output text correct.