I know that / is illegal in Linux, and the following are illegal in Windows
(I think) *
.
"
/
\
[
]
:
;
|
=
,
What else am I missing?
I need a comprehensive guide, however, and one that takes into account double-byte characters. Linking to outside resources is fine with me.
I need to first create a directory on the filesystem using a name that may contain forbidden characters, so I plan to replace those characters with underscores. I then need to write this directory and its contents to a zip file (using Java), so any additional advice concerning the names of zip directories would be appreciated.
In Unix shells, you can quote almost every character in single quotes
'
. Except the single quote itself, and you can't express control characters, because\
is not expanded. Accessing the single quote itself from within a quoted string is possible, because you can concatenate strings with single and double quotes, like'I'"'"'m'
which can be used to access a file called"I'm"
(double quote also possible here).So you should avoid all control characters, because they are too difficult to enter in the shell. The rest still is funny, especially files starting with a dash, because most commands read those as options unless you have two dashes
--
before, or you specify them with./
, which also hides the starting-
.If you want to be nice, don't use any of the characters the shell and typical commands use as syntactical elements, sometimes position dependent, so e.g. you can still use
-
, but not as first character; same with.
, you can use it as first character only when you mean it ("hidden file"). When you are mean, your file names are VT100 escape sequences ;-), so that an ls garbles the output.Under Linux and other Unix-related systems, there are only two characters that cannot appear in the name of a file or directory, and those are NUL
'\0'
and slash'/'
. The slash, of course, can appear in a path name, separating directory components.Rumour1 has it that Steven Bourne (of 'shell' fame) had a directory containing 254 files, one for every single letter (character code) that can appear in a file name (excluding
/
,'\0'
; the name.
was the current directory, of course). It was used to test the Bourne shell and routinely wrought havoc on unwary programs such as backup programs.Other people have covered the Windows rules.
Note that MacOS X has a case-insensitive file system.
1 It was Kernighan & Pike in The Practice of Programming who said as much in Chapter 6, Testing, §6.5 Stress Tests:
For Windows you can check it using PowerShell
To display UTF-8 codes you can convert
I had the same need and was looking for recommendation or standard references and came across this thread. My current blacklist of characters that should be avoided in file and directory names are:
As of 18/04/2017, no simple black or white list of characters and filenames is evident among the answers to this topic - and there are many replies.
The best suggestion I could come up with was to let the user name the file however he likes. Using an error handler when the application tries to save the file, catch any exceptions, assume the filename is to blame (obviously after making sure the save path was ok as well), and prompt the user for a new file name. For best results, place this checking procedure within a loop that continues until either the user gets it right or gives up. Worked best for me (at least in VBA).
When creating internet shortcuts in Windows, to create the file name, it skips illegal characters, except for forward slash, which is converted to minus.