tl;dr: How do I ask Windows what the current directory separator character on the system is?
Different versions of Windows seem to behave differently (e.g. \
and /
both work on the English versions, ¥ is apparently on the Japanese version, ₩ is apparently on the Korean version, etc...
Is there any way to avoid hard-coding this, and instead ask Windows at run time?
Note:
Ideally, the solution should not depend on a high-level DLL like ShlWAPI.dll
, because lower-level libraries also depend on this. So it should really either depend on kernel32.dll
or ntdll.dll
or the like... although I'm having a trouble finding anything at all, whether at a high level or at a low level.
Edit:
A little experimentation told me that it's the Win32 subsystem (i.e. kernel32.dll
... or is it perhaps RtlDosPathNameToNtPathName_U
in ntdll.dll
? not sure, didn't test...) which converts forward slashes to backslashes, not the kernel. (Prefixing \\?\
makes it impossible to use forward slashes later in the path -- and the NT native user-mode API also fails with forward slashes.)
So apparently it's not quite "built into" Windows, but rather just a compatibility feature -- which means you can't just blindly substitute slashes instead of backslashes, because any program which randomly prefixes \\?\
to paths will automatically break on forward slashes.
I have mixed feelings on what conclusions to make regarding this, but I just thought I'd mention it.
(I tagged this as "path separator" even though that's technically incorrect because the path separator is used for separating paths, not directories (;
vs. \
). Hopefully people get what I meant.)
The standard forward slash (
/
) has always worked in all versions of DOS and Windows. If you use it, you don't have to worry about issues with how the backslash is displayed on Japanese and Korean versions of Windows, and you also don't have to special-case the path separator for Windows as opposed to POSIX (including Mac). Just use forward slash everywhere.While the
₩
and¥
characters are shown as directory separator symbols in the respective Korean and Japanese windows versions, they are only how those versions of Windows represent the same Unicode code pointU+005c
as a glyph. The underlying code point for backslash is still the same across English Windows and the Japanese and Korean windows versions.Extra confirmation for this can be found on this page: http://msdn.microsoft.com/en-us/library/dd374047(v=vs.85).aspx
Also, I don't know of any Windows API function that gets you the system's path separator, but you can rely on it being
\
in all circumstances.http://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29.aspx#naming_conventions
About
/
Windows should support the use of
/
as a directory separator in the API functions, though not necessarily in the command prompt (command.com
).It's 'tough' to figure out the truth of all this, but this might be a really helpful link about
/
in Windows paths: http://bytes.com/topic/python/answers/23123-when-did-windows-start-accepting-forward-slash-path-separatorThe original poster added the phrase "kernel-mode" in a comment to someone else's answer.
If the original question intended to ask about kernel mode, then it probably isn't a good idea to depend on / being a path separator. Different file systems allow different character sets on disk. Different file system drivers in Windows can also allow different characters sets, which normally cannot include characters which the underlying file systems don't accept on disk, but sometimes they can behave strangely. For example Posix mode allows a component name to contain some characters in a path name in an NTFS partition, even though NTFS ordinarily doesn't allow those characters. (But obviously / isn't one of them, in Posix.)
In kernel mode in Unicode, U+005C is always a backslash and it is always the path separator. Unicode code points for yen and won are not U+005C and are not path separators.
In kernel mode in ANSI, complications arise depending on which ANSI code page. In code pages that are sufficiently similar to ASCII, 0x5C is a backslash and it is the path separator. In ANSI code pages 932 and 949, 0x5C is not a backslash but 0x5C might be a path separator depending on where it occurs. If 0x5C is the first byte of a multibyte character, then it's a yen sign or won sign and it is a path separator. If 0x5C is the second byte of a multibyte character, then it's not a character by itself, so it's not a yen sign or won sign and it's not a path separator. You have to start parsing from the beginning of the string to figure out if a particular char is actually a whole character or not. Also in Chinese and UTF-8, multibyte characters can be longer than two chars.