Batch file encoding

2019-01-10 12:08发布

问题:

I would like to deal with filename containing strange characters, like the French é.

Everything is working fine in the shell:

C:\somedir\>ren -hélice hélice

I know if I put this line in a .bat file, I obtain the following result:

C:\somedir\>ren -hÚlice hÚlice

See ? é have been replaced by Ú.

The same is true for command output. If I dir some directory in the shell, the output is fine. If I redirect this output to a file, some characters are transformed.

So how can I tell cmd.exe how to interpret what appears as an é in my batch file, is really an é and not a Ú or a comma?

So there is no way when executing a .bat file to give an hint about the codepage in which it was written?

回答1:

You have to save the batch file with OEM encoding. How to do this varies depending on your text editor. The encoding used in that case varies as well. For Western cultures it's usually CP850.

Batch files and encoding are really two things that don't particularly like each other. You'll notice that Unicode is also impossible to use there, unfortunately (even though environment variables handle it fine).

Alternatively, you can set the console to use another codepage:

chcp 1252

should do the trick. At least it worked for me here.

When you do output redirection, such as with dir, the same rules apply. The console window's codepage is used. You can use the /u switch to cmd.exe to force Unicode output redirection, which causes the resulting files to be in UTF-16.

As for encodings and code pages in cmd.exe in general, also see this question:

  • What encoding/code page is cmd.exe using

EDIT: As for your edit: No, cmd always assumes the batch file to be written in the console default codepage. However, you can easily include a chcp at the start of the batch:

chcp 1252>NUL
ren -hélice hélice

To make this more robust when used directly from the commandline, you may want to memorize the old code page and restore it afterwards:

@echo off
for /f "tokens=2 delims=:." %%x in ('chcp') do set cp=%%x
chcp 1252>nul
ren -hélice hélice
chcp %cp%>nul


回答2:

I created the following block, which I put at the beginning of my batch files:

set Filename=%0
IF "%Filename:~-8%" == "-850.bat" GOTO CONVERT_CODEPAGE_END
    rem Converting code page from 1252 to 850.
    rem My editors use 1252, my batch uses 850.
    rem We create a converted -850.bat file, and then launch it.
    set File850=%~n0-850.bat
    PowerShell.exe -Command "get-content %0 | out-file -encoding oem -filepath %File850%"
    call %File850%
    del %File850%
    EXIT /b 0
:CONVERT_CODEPAGE_END


回答3:

I was having trouble with this, and here is the solution I found. Find the decimal number for the character you are looking for in your current code page.

For example, I'm in codepage 437 (chcp tells you), and I want a degree sign, . http://en.wikipedia.org/wiki/Code_page_437 tells me that the degree sign is number 248.

Then you find the Unicode character with the same number.

The Unicode character at 248 (U+00F8) is .

If you insert the Unicode character in your batch script, it will display to the console as the character you desire.

So my batch file

echo

prints

°


回答4:

I care about three concepts:

  1. Output Console Encoding

  2. Command line internal encoding (that changed with chcp)

  3. .bat Text Encoding

The easiest scenario to me: I will have the first two mentioned in the same encoding, say CP850, and I will store my .bat in that same encoding (in Notepad++, menu EncodingCharacter setsWestern EuropeanOEM 850).

But suppose someone hands me a .bat in another encoding, say CP1252 (in Notepad++, menu Encoding* → Character setsWestern EuropeanWindows-1252)

Then I would change the command line internal encoding, with chcp 1252.

This changes the encoding it uses to talk with other processes, neither the input device nor output console.

So my command line instance will effectively send characters in 1252 through its STDOUT file descriptor, but gabbed text appears when the console decodes them out as 850 (é is Ú).

Then I modify the file as follows:

@echo off

perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hlice hlice\n\"));"
ren -hlice hlice

First I turn echo off so the commands don't output unless explicitly doing either echo... or perl -e "print..."

Then I put this boilerplate each time I need to output something

perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hélice hélice\n\"));"

I substitute the actual text I'll show for this: ren -hélice hélice.

And also I could need to substitute my console encoding for cp850 and other side encoding for cp1252.

And just below I put the desired command.

I did broke the problematic line into the output half and the real command half.

  • The first I make for sure: The "é" is interpreted as an "é" by means of transcoding. It is necessary for all the output sentences since the console and the file are at different encodings.

  • The second, the real command (muttered with @echo off), knowing we have the same encoding both from chcp and the .bat text is enough to ensure a proper character interpretation.



回答5:

I had polish signs inside the code in R (eg. ą, ę, ź, ż etc.) and had the problem while running this R script with .bat file (in the output file .Rout instead of those signs there were signs like %, &, # etc. and the code didn't run to the end).

My solution:

  1. Save R script with encoding: File > Save with encoding > CP1250
  2. Run .bat file

It worked for me but if there is still the problem, try to use the other encodings.