What characters are forbidden in Windows and Linux

2018-12-31 08:15发布

I know that / is illegal in Linux, and the following are illegal in Windows (I think) * . " / \ [ ] : ; | = ,

What else am I missing?

I need a comprehensive guide, however, and one that takes into account double-byte characters. Linking to outside resources is fine with me.

I need to first create a directory on the filesystem using a name that may contain forbidden characters, so I plan to replace those characters with underscores. I then need to write this directory and its contents to a zip file (using Java), so any additional advice concerning the names of zip directories would be appreciated.

12条回答
无与为乐者.
2楼-- · 2018-12-31 08:43

Well, if only for research purposes, then your best bet is to look at this Wikipedia entry on Filenames.

If you want to write a portable function to validate user input and create filenames based on that, the short answer is don't. Take a look at a portable module like Perl's File::Spec to have a glimpse to all the hops needed to accomplish such a "simple" task.

查看更多
何处买醉
3楼-- · 2018-12-31 08:46

A “comprehensive guide” of forbidden filename characters is not going to work on Windows because it reserves filenames as well as characters. Yes, characters like * " ? and others are forbidden, but there are a infinite number of names composed only of valid characters that are forbidden. For example, spaces and dots are valid filename characters, but names composed only of those characters are forbidden.

Windows does not distinguish between upper-case and lower-case characters, so you cannot create a folder named A if one named a already exists. Worse, seemingly-allowed names like PRN and CON, and many others, are reserved and not allowed. Windows also has several length restrictions; a filename valid in one folder may become invalid if moved to another folder. The rules for naming files and folders is on MSDN.

You cannot, in general, use user-generated text to create Windows directory names. If you want to allow users to name anything they want, you have to create safe names like A, AB, A2 et al., store user-generated names and their path equivalents in an application data file, and perform path mapping in your application.

If you absolutely must allow user-generated folder names, the only way to tell if they are invalid is to catch exceptions and assume the name is invalid. Even that is fraught with peril, as the exceptions thrown for denied access, offline drives, and out of drive space overlap with those that can be thrown for invalid names. You are opening up one huge can of hurt.

查看更多
骚的不知所云
4楼-- · 2018-12-31 08:46

Let's keep it simple and answer the question, first.

  1. The forbidden printable ASCII characters are:

    • Linux/Unix:

      / (forward slash)
      
    • Windows:

      < (less than)
      > (greater than)
      : (colon - sometimes works, but is actually NTFS Alternate Data Streams)
      " (double quote)
      / (forward slash)
      \ (backslash)
      | (vertical bar or pipe)
      ? (question mark)
      * (asterisk)
      
  2. Non-printable characters

    If your data comes from a source that would permit non-printable characters then there is more to check for.

    • Linux/Unix:

      0 (NULL byte)
      
    • Windows:

      0-31 (ASCII control characters)
      

    Note: While it is legal under Linux/Unix file systems to create files with control characters in the filename, it might be a nightmare for the users to deal with such files.

  3. Reserved file names

    The following filenames are reserved:

    • Windows:

      CON, PRN, AUX, NUL 
      COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9
      LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9
      

      (both on their own and with arbitrary file extensions, e.g. LPT1.txt).

  4. Other rules

    • Windows:

      Filenames cannot end in a space or dot.

查看更多
无与为乐者.
5楼-- · 2018-12-31 08:47

The easy way to get Windows to tell you the answer is to attempt to rename a file via Explorer and type in / for the new name. Windows will popup a message box telling you the list of illegal characters.

A filename cannot contain any of the following characters:
    \ / : * ? " < > | 

https://support.microsoft.com/en-us/kb/177506

查看更多
低头抚发
6楼-- · 2018-12-31 08:48

Instead of creating a blacklist of characters, you could use a whitelist. All things considered, the range of characters that make sense in a file or directory name context is quite short, and unless you have some very specific naming requirements your users will not hold it against your application if they cannot use the whole ASCII table.

It does not solve the problem of reserved names in the target file system, but with a whitelist it is easier to mitigate the risks at the source.

In that spirit, this is a range of characters that can be considered safe:

  • Letters (a-z A-Z) - Unicode characters as well, if needed
  • Digits (0-9)
  • Underscore (_)
  • Hyphen (-)
  • Space
  • Dot (.)

And any additional safe characters you wish to allow. Beyond this, you just have to enforce some additional rules regarding spaces and dots. This is usually sufficient:

  • Name must contain at least one letter or number (to avoid only dots/spaces)
  • Name must start with a letter or number (to avoid leading dots/spaces)

This already allows quite complex and nonsensical names. For example, these names would be possible with these rules, and be valid file names in Windows/Linux:

  • A...........ext
  • B -.- .ext

In essence, even with so few whitelisted characters you should still decide what actually makes sense, and validate/adjust the name accordingly. In one of my applications, I used the same rules as above but stripped any duplicate dots and spaces.

查看更多
弹指情弦暗扣
7楼-- · 2018-12-31 08:53

Though the only illegal Unix chars might be / and NULL, although some consideration for command line interpretation should be included.

For example, while it might be legal to name a file 1>&2 or 2>&1 in Unix, file names such as this might be misinterpreted when used on a command line.

Similarly it might be possible to name a file $PATH, but when trying to access it from the command line, the shell will translate $PATH to its variable value.

查看更多
登录 后发表回答