How do I create files/folders with diacritics usin

2019-08-04 07:27发布

问题:

I have a bat file that reads the lines a file, and then tries to create files or folders, depending on the given argument.

The problem is that when it gets to chars as ăâțîș, it does not work.

This is my code:

IF "%1"=="" GOTO Final
IF "%1"=="file" GOTO File
IF "%1"=="folder" GOTO Folder

:File
    for /f %%i in (files.txt) do echo. > %%i.rtf
GOTO Final

:Folder
    for /f "tokens=*" %%a in (folders.txt) do (
    mkdir "%%a"
    )
GOTO Final

:Final

What I've tried so far using this link: Manage paths with accented characters

  1. The bat script is ANSI
  2. CHCP 1250 > NUL

How can i solve this?

回答1:

Put CHCP XXX into the batch where XXX is a codepage that matches encoding of your text files (files.txt and folders.txt). Note that you can use CHCP 65001 which is equivalent of UTF-8 and should handle most of diactrics without problems.



回答2:

Avoid using accented characters in file and folder names. Otherwise, mojibake warranted in windows command line.

md files   2>NUL
pushd files
md unASCII 2>NUL
chcp 852 >nul
echo ěščřžýáíé-852>diacritic--852.txt
chcp 1250 >nul
echo ěščřžýáíé1250>diacritic-1250.txt

chcp 1250 >nul
findstr /R "^" "diacritic-*.txt"
for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo %G:%g
for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo(%~nG:%g>"unASCII\%gANSI.txt"
chcp 852 >nul
findstr /R "^" "diacritic-*.txt"
for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo %G:%g
for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo(%~nG:%g>"unASCII\%g-OEM.txt"
popd

Note that above list of CLI commands isn't a .bat code snippet. However, copying & pasting it in a command line window gives roughly next output showing that code page actual when a file is created and used must chime in with each other. Otherwise, a crystalline mojibake visible, see e.g. findstr /R "^" "diacritic-*.txt":

==>md files   2>NUL
==>pushd files
==>md unASCII 2>NUL

==>chcp 852 >nul
==>echo ěščřžýáíé-852>diacritic--852.txt

==>chcp 1250 >nul
==>echo ěščřžýáíé1250>diacritic-1250.txt

==>
==>chcp 1250 >nul

==>findstr /R "^" "diacritic-*.txt"
diacritic--852.txt:Řçźý§ě ˇ‚-852
diacritic-1250.txt:ěščřžýáíé1250

==>for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo %G:%g
diacritic--852.txt:Řçźý§ě ˇ‚-852
diacritic-1250.txt:ěščřžýáíé1250

==>for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo(%~nG:%g>"unASCII\%gANSI.txt"

==>chcp 852 >nul

==>findstr /R "^" "diacritic-*.txt"
diacritic--852.txt:ěščřžýáíé-852
diacritic-1250.txt:ýÜŔ°×řßÝÚ1250

==>for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo %G:%g
diacritic--852.txt:ěščřžýáíé-852
diacritic-1250.txt:ýÜŔ°×řßÝÚ1250

==>for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo(%~nG:%g>"unASCII\%g-OEM.txt"

==>popd

We have written ěščřžýáíé string (followed with CHCP number) to next files:

  • ěščřžýáíé-852 string in the files\diacritic--852.txt file, and
  • ěščřžýáíé1250 string in the files\diacritic-1250.txt file.

Then, we used those strings to create files of the <String><Chcp><CPID>.txt name pattern, where

  • <String> = ěščřžýáíé string with diacritics read from diacritic-<Chcp>.txt file;
  • <Chcp> = -852 or 1250: code page which the diacritic-<Chcp>.txt file was written under;
  • <CPID> = -OEM or ANSI: textual abbreviation of code page name which this file was written under (852 and 1250, respectively).

Lets try to use last four files: Copy&Paste following code snippet in a command line window again:

chcp 437 >nul
dir /B /S "files\unASCII\*.txt"
for %G in (files\unASCII\ěščřžýáíé*.txt) do @echo %G
findstr /S /R "^" "files\unASCII\ěščřžýáíé*.txt"

chcp 1250 >nul
for %G in (files\unASCII\ěščřžýáíé*.txt) do type "%G"
chcp 852 >nul
for %G in (files\unASCII\ěščřžýáíé*.txt) do type "%G"

Output: we could see mojibake again and again:

==>chcp 437 >nul

==>dir /B /S "files\unASCII\*.txt"
d:\bat\files\unASCII\ýÜŔ°×řßÝÚ1250-OEM.txt
d:\bat\files\unASCII\ěščřžýáíé-852-OEM.txt
d:\bat\files\unASCII\ěščřžýáíé1250ANSI.txt
d:\bat\files\unASCII\Řçźý§ě ˇ‚-852ANSI.txt

==>for %G in (files\unASCII\ěščřžýáíé*.txt) do @echo %G
files\unASCII\ěščřžýáíé-852-OEM.txt
files\unASCII\ěščřžýáíé1250ANSI.txt

==>findstr /S /R "^" "files\unASCII\ěščřžýáíé*.txt"

==>
==>chcp 1250 >nul

==>for %G in (files\unASCII\ěščřžýáíé*.txt) do type "%G"

==>type "files\unASCII\ěščřžýáíé-852-OEM.txt"
diacritic--852:Řçźý§ě ˇ‚-852

==>type "files\unASCII\ěščřžýáíé1250ANSI.txt"
diacritic-1250:ěščřžýáíé1250

==>chcp 852 >nul

==>for %G in (files\unASCII\ěščřžýáíé*.txt) do type "%G"

==>type "files\unASCII\ěščřžýáíé-852-OEM.txt"
diacritic--852:ěščřžýáíé-852

==>type "files\unASCII\ěščřžýáíé1250ANSI.txt"
diacritic-1250:ýÜŔ°×řßÝÚ1250

OOPS, why there is no output from findstr? Let's use

chcp 1250 >nul
findstr /S /R "^" "files\unASCII\*.txt"
chcp 852 >nul
findstr /S /R "^" "files\unASCII\*.txt"

Output shows that findstr causes mojibake not only in file contents but in file names as well:

==>chcp 1250 >nul

==>findstr /S /R "^" "files\unASCII\*.txt"
FINDSTR: Cannot open files\unASCII\ŤsR›zr ˇ‚1250-OEM.txt
FINDSTR: Cannot open files\unASCII\escrzŤ˙­'-852-OEM.txt
FINDSTR: Cannot open files\unASCII\escrzŤ˙­'1250ANSI.txt
FINDSTR: Cannot open files\unASCII\RÎzŤäe?'-852ANSI.txt

==>chcp 852 >nul

==>findstr /S /R "^" "files\unASCII\*.txt"
FINDSTR: Cannot open files\unASCII\ŹsRŤzráíé1250-OEM.txt
FINDSTR: Cannot open files\unASCII\escrzŹ ş'-852-OEM.txt
FINDSTR: Cannot open files\unASCII\escrzŹ ş'1250ANSI.txt
FINDSTR: Cannot open files\unASCII\R╬zŹńeś?'-852ANSI.txt

FYI: nor neither CHCP 65001 (UTF-8) could help... And as per MSDN: Naming Files, Paths, and Namespaces, Windows NTFS object names seem to be UTF-16 encoded:

On newer file systems, such as NTFS, exFAT, UDFS, and FAT32, Windows stores the long file names on disk in Unicode ... the file system treats path and file names as an opaque sequence of WCHARs.

Moreover:

The shell and the file system have different requirements. It is possible to create a path with the Windows API that the shell user interface is not able to interpret properly.