How to manipulate unicode-named files with subvers

2020-07-23 03:30发布

问题:

Say I use Windows 7 with code page 950 (Big5, Traditional Chinese), I want to manipulate some files mixed with unicode name such as 简体中文文件.txt (GB2312, Simplified Chinese) with svn.

If I use chcp 950, when I run:

svn add .\简体中文文件.txt

I get an error:

svn: warning: W155010: 'D:\path\to\work-dir\?体中文文件.txt'
not found
svn: E200009: Could not add all targets because some targets don't exist
svn: E200009: Illegal target for the requested operation

If I use chcp 65001 (UTF-8), I get an even worse error:

svn: warning: W155010: 'D:\path\to\work-dir\?体svn: E200009: C
ould not add all targets because some targets don't exist
svn: E200009: Illegal target for the requested operation

I'd like to try chcp 1200 (UCS-LE) but it says:

Invalid code page

It seems that TortoiseSVN can manipulate those files correctly. However I need to write scripts calling svn to run several automated jobs. Is there any solution available?

回答1:

Programs like svn that use the MS implementation of the C standard library's file IO functions cannot read command input or file names containing characters outside the current code page. You would have to chcp to a suitable code page for each file separately (eg 936 for Chinese).

In theory code page 65001 could cover every character, but unfortunately the MS C runtime has serious bugs that usually break applications when this code page is in use. Microsoft's ongoing failure to fix this long-standing problem leaves UTF-8 a second-class citizen under Windows.

In the future it looks like http://subversion.tigris.org/issues/show_bug.cgi?id=1537 should fix the problem by using direct Win32 APIs instead of C stdlib to do console writes, though I can't see where the related code change is to confirm whether console input and file access are similarly addressed.