I have a bat file that reads the lines a file, and then tries to create files or folders, depending on the given argument.
The problem is that when it gets to chars as ăâțîș, it does not work.
This is my code:
IF "%1"=="" GOTO Final
IF "%1"=="file" GOTO File
IF "%1"=="folder" GOTO Folder
:File
for /f %%i in (files.txt) do echo. > %%i.rtf
GOTO Final
:Folder
for /f "tokens=*" %%a in (folders.txt) do (
mkdir "%%a"
)
GOTO Final
:Final
What I've tried so far using this link: Manage paths with accented characters
- The bat script is ANSI
- CHCP 1250 > NUL
How can i solve this?
Put CHCP XXX
into the batch where XXX is a codepage that matches encoding of your text files (files.txt and folders.txt). Note that you can use CHCP 65001
which is equivalent of UTF-8 and should handle most of diactrics without problems.
Avoid using accented characters in file and folder names. Otherwise, mojibake warranted in windows command line.
md files 2>NUL
pushd files
md unASCII 2>NUL
chcp 852 >nul
echo ěščřžýáíé-852>diacritic--852.txt
chcp 1250 >nul
echo ěščřžýáíé1250>diacritic-1250.txt
chcp 1250 >nul
findstr /R "^" "diacritic-*.txt"
for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo %G:%g
for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo(%~nG:%g>"unASCII\%gANSI.txt"
chcp 852 >nul
findstr /R "^" "diacritic-*.txt"
for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo %G:%g
for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo(%~nG:%g>"unASCII\%g-OEM.txt"
popd
Note that above list of CLI
commands isn't a .bat
code snippet. However, copying & pasting it in a command line window gives roughly next output showing that code page actual when a file is created and used must chime in with each other. Otherwise, a crystalline mojibake visible, see e.g. findstr /R "^" "diacritic-*.txt"
:
==>md files 2>NUL
==>pushd files
==>md unASCII 2>NUL
==>chcp 852 >nul
==>echo ěščřžýáíé-852>diacritic--852.txt
==>chcp 1250 >nul
==>echo ěščřžýáíé1250>diacritic-1250.txt
==>
==>chcp 1250 >nul
==>findstr /R "^" "diacritic-*.txt"
diacritic--852.txt:Řçźý§ě ˇ‚-852
diacritic-1250.txt:ěščřžýáíé1250
==>for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo %G:%g
diacritic--852.txt:Řçźý§ě ˇ‚-852
diacritic-1250.txt:ěščřžýáíé1250
==>for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo(%~nG:%g>"unASCII\%gANSI.txt"
==>chcp 852 >nul
==>findstr /R "^" "diacritic-*.txt"
diacritic--852.txt:ěščřžýáíé-852
diacritic-1250.txt:ýÜŔ°×řßÝÚ1250
==>for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo %G:%g
diacritic--852.txt:ěščřžýáíé-852
diacritic-1250.txt:ýÜŔ°×řßÝÚ1250
==>for %G in (diacritic*.txt) do @for /F %g in (%G) do @echo(%~nG:%g>"unASCII\%g-OEM.txt"
==>popd
We have written ěščřžýáíé
string (followed with CHCP number) to next files:
ěščřžýáíé-852
string in the files\diacritic--852.txt
file, and
ěščřžýáíé1250
string in the files\diacritic-1250.txt
file.
Then, we used those strings to create files of the <String><Chcp><CPID>.txt
name pattern, where
<String>
= ěščřžýáíé
string with diacritics read from diacritic-<Chcp>.txt
file;
<Chcp>
= -852
or 1250
: code page which the diacritic-<Chcp>.txt
file was written under;
<CPID>
= -OEM
or ANSI
: textual abbreviation of code page name which this file was written under (852
and 1250
, respectively).
Lets try to use last four files: Copy&Paste
following code snippet in a command line window again:
chcp 437 >nul
dir /B /S "files\unASCII\*.txt"
for %G in (files\unASCII\ěščřžýáíé*.txt) do @echo %G
findstr /S /R "^" "files\unASCII\ěščřžýáíé*.txt"
chcp 1250 >nul
for %G in (files\unASCII\ěščřžýáíé*.txt) do type "%G"
chcp 852 >nul
for %G in (files\unASCII\ěščřžýáíé*.txt) do type "%G"
Output: we could see mojibake again and again:
==>chcp 437 >nul
==>dir /B /S "files\unASCII\*.txt"
d:\bat\files\unASCII\ýÜŔ°×řßÝÚ1250-OEM.txt
d:\bat\files\unASCII\ěščřžýáíé-852-OEM.txt
d:\bat\files\unASCII\ěščřžýáíé1250ANSI.txt
d:\bat\files\unASCII\Řçźý§ě ˇ‚-852ANSI.txt
==>for %G in (files\unASCII\ěščřžýáíé*.txt) do @echo %G
files\unASCII\ěščřžýáíé-852-OEM.txt
files\unASCII\ěščřžýáíé1250ANSI.txt
==>findstr /S /R "^" "files\unASCII\ěščřžýáíé*.txt"
==>
==>chcp 1250 >nul
==>for %G in (files\unASCII\ěščřžýáíé*.txt) do type "%G"
==>type "files\unASCII\ěščřžýáíé-852-OEM.txt"
diacritic--852:Řçźý§ě ˇ‚-852
==>type "files\unASCII\ěščřžýáíé1250ANSI.txt"
diacritic-1250:ěščřžýáíé1250
==>chcp 852 >nul
==>for %G in (files\unASCII\ěščřžýáíé*.txt) do type "%G"
==>type "files\unASCII\ěščřžýáíé-852-OEM.txt"
diacritic--852:ěščřžýáíé-852
==>type "files\unASCII\ěščřžýáíé1250ANSI.txt"
diacritic-1250:ýÜŔ°×řßÝÚ1250
OOPS, why there is no output from findstr
? Let's use
chcp 1250 >nul
findstr /S /R "^" "files\unASCII\*.txt"
chcp 852 >nul
findstr /S /R "^" "files\unASCII\*.txt"
Output shows that findstr
causes mojibake not only in file contents but in file names as well:
==>chcp 1250 >nul
==>findstr /S /R "^" "files\unASCII\*.txt"
FINDSTR: Cannot open files\unASCII\ŤsR›zr ˇ‚1250-OEM.txt
FINDSTR: Cannot open files\unASCII\escrzŤ˙'-852-OEM.txt
FINDSTR: Cannot open files\unASCII\escrzŤ˙'1250ANSI.txt
FINDSTR: Cannot open files\unASCII\RÎzŤäe?'-852ANSI.txt
==>chcp 852 >nul
==>findstr /S /R "^" "files\unASCII\*.txt"
FINDSTR: Cannot open files\unASCII\ŹsRŤzráíé1250-OEM.txt
FINDSTR: Cannot open files\unASCII\escrzŹ ş'-852-OEM.txt
FINDSTR: Cannot open files\unASCII\escrzŹ ş'1250ANSI.txt
FINDSTR: Cannot open files\unASCII\R╬zŹńeś?'-852ANSI.txt
FYI: nor neither CHCP 65001
(UTF-8
) could help... And as per MSDN: Naming Files, Paths, and Namespaces, Windows NTFS object names seem to be UTF-16
encoded:
On newer file systems, such as NTFS
, exFAT
, UDFS
, and FAT32
,
Windows stores the long file names on disk in Unicode ... the file
system treats path and file names as an opaque sequence of WCHARs.
Moreover:
The shell and the file system have different requirements. It is
possible to create a path with the Windows API that the shell user
interface is not able to interpret properly.