How to know file type without extension

2019-05-18 15:55发布

While trying to come-up with a servlet based application to read files and manipulate them (image type conversion) here is a question that came up to me:

  1. Is it possible to inspect a file content and know the filetype?
  2. Is there a standard that specifies that each file MUST provide some type of marker in their content so that the application will not have to rely on the file extension constraints?

Consider an application scenario:

I am creating an application that will be able to convert different file formats to a set of output formats. Say user uploads an PDF, my application can suggest that the possible conversion formats are microsoft word or TIFF or JPEG etc.

As my application will gradually support different file formats (over a period of time), I want my application to inspect the input file instead of having the user to specify the format. And suggest to user the possible formats of output.

I understand this is an open ended, broad question. Please let me know if it needs to be modified.

Thanks, Ayusman

1条回答
别忘想泡老子
2楼-- · 2019-05-18 16:41

Yeap you can figure out the type without an extension using the magic number. Also, the way the file command figures it out, is actually through a 3 step check:

  1. Check for filesystem properties to identifie empty files, folders, etc...
  2. The said magic number
  3. In text files, check for language in it

Here's a library that'll help you with Magic Numbers: jmimemagic

查看更多
登录 后发表回答