Validation of files based on their file extensions

2019-02-20 07:50发布

问题:

I get files from queues in Java. They may be of following formats.

  1. docx
  2. pdf
  3. doc
  4. xls
  5. xlsx
  6. txt
  7. rtf

After reading their extensions, I want to validate whether they are actually files of these types.

For example, I got a file and checked that it has extension .xls. Afterwards, I want to check whether it is actually an .xls file or someone uploaded file of some other format after changing its extension.

EDIT: I'd like to check the file's MIME type by actually checking its content, not its extension. How it can be done?

回答1:

I don't think this is a problem you should be solving. Any solution to this problem would be brittle and based upon your current understand of what constitutes a valid file of a particular type.

For example, take a XLS file. Do you know for sure what Excel accepts when opening such a file? Can you be sure you'll keep abreast of any changes in future releases that might support a different encoding style?

Ask yourself - what's the worse that could happen if the user uploads a file of the wrong type? Perhaps you'll pass the file to the application that handles that file extension and you'll get an error? Not a problem, just pass that to the user!



回答2:

Without using external libraries:

You can get the file mimetype using MimetypesFileTypeMap:

    File f = new File(...);
    System.out.println(new MimetypesFileTypeMap().getContentType(f));

You can get a similar result with: URLConnection.guessContentTypeFromName Both these solutions, according to the documentation, look only at the extension.

A better option: URLConnection.guessContentTypeFromStream

    File f= new File(...);
    System.out.println(URLConnection.guessContentTypeFromStream(new FileInputStream(f)));

This try to guess from the first bytes of the file - be warned this is only a guess - I found it works in most cases, but fails to detect some obvious types.

I recommend a combination of both.