Get directory separator char on Windows? ('\\&

2019-01-15 12:39发布

问题:

tl;dr: How do I ask Windows what the current directory separator character on the system is?


Different versions of Windows seem to behave differently (e.g. \ and / both work on the English versions, ¥ is apparently on the Japanese version, ₩ is apparently on the Korean version, etc...

Is there any way to avoid hard-coding this, and instead ask Windows at run time?

Note:

Ideally, the solution should not depend on a high-level DLL like ShlWAPI.dll, because lower-level libraries also depend on this. So it should really either depend on kernel32.dll or ntdll.dll or the like... although I'm having a trouble finding anything at all, whether at a high level or at a low level.

Edit:

A little experimentation told me that it's the Win32 subsystem (i.e. kernel32.dll... or is it perhaps RtlDosPathNameToNtPathName_U in ntdll.dll? not sure, didn't test...) which converts forward slashes to backslashes, not the kernel. (Prefixing \\?\ makes it impossible to use forward slashes later in the path -- and the NT native user-mode API also fails with forward slashes.)

So apparently it's not quite "built into" Windows, but rather just a compatibility feature -- which means you can't just blindly substitute slashes instead of backslashes, because any program which randomly prefixes \\?\ to paths will automatically break on forward slashes.

I have mixed feelings on what conclusions to make regarding this, but I just thought I'd mention it.

(I tagged this as "path separator" even though that's technically incorrect because the path separator is used for separating paths, not directories (; vs. \). Hopefully people get what I meant.)

回答1:

While the and ¥ characters are shown as directory separator symbols in the respective Korean and Japanese windows versions, they are only how those versions of Windows represent the same Unicode code point U+005c as a glyph. The underlying code point for backslash is still the same across English Windows and the Japanese and Korean windows versions.

Extra confirmation for this can be found on this page: http://msdn.microsoft.com/en-us/library/dd374047(v=vs.85).aspx

Security Considerations for Character Sets in File Names

Windows code page and OEM character sets used on Japanese-language systems contain the Yen symbol (¥) instead of a backslash (\). Thus, the Yen character is a prohibited character for NTFS and FAT file systems. When mapping Unicode to a Japanese-language code page, conversion functions map both backslash (U+005C) and the normal Unicode Yen symbol (U+00A5) to this same character. For security reasons, your applications should not typically allow the character U+00A5 in a Unicode string that might be converted for use as a FAT file name.

Also, I don't know of any Windows API function that gets you the system's path separator, but you can rely on it being \ in all circumstances.

http://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29.aspx#naming_conventions

The following fundamental rules enable applications to create and process valid names for files and directories, regardless of the file system:

...

Use a backslash (\) to separate the components of a path. The backslash divides the file name from the path to it, and one directory name from another directory name in a path. You cannot use a backslash in the name for the actual file or directory because it is a reserved character that separates the names into components.

...

About /

Windows should support the use of / as a directory separator in the API functions, though not necessarily in the command prompt (command.com).

Note File I/O functions in the Windows API convert "/" to "\" as part of converting the name to an NT-style name, except when using the "\?\" prefix as detailed in the following sections.

It's 'tough' to figure out the truth of all this, but this might be a really helpful link about / in Windows paths: http://bytes.com/topic/python/answers/23123-when-did-windows-start-accepting-forward-slash-path-separator



回答2:

The original poster added the phrase "kernel-mode" in a comment to someone else's answer.

If the original question intended to ask about kernel mode, then it probably isn't a good idea to depend on / being a path separator. Different file systems allow different character sets on disk. Different file system drivers in Windows can also allow different characters sets, which normally cannot include characters which the underlying file systems don't accept on disk, but sometimes they can behave strangely. For example Posix mode allows a component name to contain some characters in a path name in an NTFS partition, even though NTFS ordinarily doesn't allow those characters. (But obviously / isn't one of them, in Posix.)

In kernel mode in Unicode, U+005C is always a backslash and it is always the path separator. Unicode code points for yen and won are not U+005C and are not path separators.

In kernel mode in ANSI, complications arise depending on which ANSI code page. In code pages that are sufficiently similar to ASCII, 0x5C is a backslash and it is the path separator. In ANSI code pages 932 and 949, 0x5C is not a backslash but 0x5C might be a path separator depending on where it occurs. If 0x5C is the first byte of a multibyte character, then it's a yen sign or won sign and it is a path separator. If 0x5C is the second byte of a multibyte character, then it's not a character by itself, so it's not a yen sign or won sign and it's not a path separator. You have to start parsing from the beginning of the string to figure out if a particular char is actually a whole character or not. Also in Chinese and UTF-8, multibyte characters can be longer than two chars.



回答3:

The standard forward slash (/) has always worked in all versions of DOS and Windows. If you use it, you don't have to worry about issues with how the backslash is displayed on Japanese and Korean versions of Windows, and you also don't have to special-case the path separator for Windows as opposed to POSIX (including Mac). Just use forward slash everywhere.