I know we can get it done by extension or by mime type, do we have any other way through which we can get the idea of type of file whether it is .docx or .doc.
相关问题
- Delete Messages from a Topic in Apache Kafka
- Jackson Deserialization not calling deserialize on
- How to maintain order of key-value in DataFrame sa
- StackExchange API - Deserialize Date in JSON Respo
- Difference between Types.INTEGER and Types.NULL in
If it is just a matter of decided whether a collection of files known to either be
.doc
or.docx
but are not marked accordingly with an extension, you can use the fact that a.docx
file is a zipped collection of files. Something to the tune as follows might help:where
fileStream
is whatever file or other input stream you wish to evaluate. You could further evaluate a zipped file by looking for key.docx
entries. A good starting reference is Word Document (DOCX). Likewise, if you know it is just a binary file, you can test for Word's File Information Block (see Word (.doc) Binary File Format)Why dont you use
Apache Tika
:There is a way, no strightforward though. But with Apache POI, you can locate it.
Try to read a .docx file using HWPFDocument Class. It would give you the following error
.docx can be read using XWPFDocument Class.
You could use Apache Tika for content Detection. But you should been aware that this is a huge framework (many required dependencies) for such a small task.
Assuming you're using Apache POI, you have a few options.
One is to grab the first few bytes of the file, and ask POIFSFileSystem with the hasPOIFSHeader(byte) method. If you have a stream that supports mark/reset, you can instead use POIFSFileSystem.hasPOIFSHeader(InputStream). If those return
true
then try to open it as a.doc
withHWPF
, otherwise try as.docx
withXWPF
Otherwise, if you prefer a try/catch way, try to open it with
POIFSFileSystem
and catchOfficeXmlFileException
- if it opens fine it's.doc
, if you get the exception it's.docx
If you look at the source code for WorkbookFactory you'll see the first pattern in use, you can copy a similar set of logic form that