I am developing standard alone Java batch process. I am trying to determine file attachment mimetype using Tika Jars. I am using Tika 1.4 Jar files.
My code look like
Parser parser= new AutoDetectParser();
InputStream stream = new FileInputStream(fileAttachment);
int writerHandler =-1;
ContentHandler contentHandler= new BodyContentHandler(writerHandler);
Metadata metadata= new Metadata();
parser.parse(stream, contentHandler, metadata, new ParseContext());
String mimeType = metadata.get(Metadata.CONTENT_TYPE);
logger.debug("File Attachment: "+fileattachment.getName()+" MimeType is: "+mimeType);
This code is not working properly for the office 03 and 07 documents.
While running from eclipse I am getting correct mimetypes.
I build jar file and running from command its giving wrong mimetypes.
out put from command
------------
File Attachment: Testpdf.pdf MimeType is: application/pdf
File Attachment: Testpdf.tif MimeType is: image/tiff
File Attachment: Testpdf.xlsx MimeType is: application/x-tika-ooxml
File Attachment: Testpdf.xltx MimeType is: application/x-tika-ooxml
File Attachment: Testpdf.pptx MimeType is: application/x-tika-ooxml
File Attachment: Testpdf.docx MimeType is: application/x-tika-ooxml
File Attachment: Testpdf.xls MimeType is: application/zip
File Attachment: Testpdf.doc MimeType is: application/x-tika-msoffice
File Attachment: Testpdf.dot MimeType is: application/x-tika-msoffice
File Attachment: Testpdf.ppt MimeType is: application/x-tika-msoffice
File Attachment: Testpdf.xlt MimeType is: application/vnd.ms-excel
I tried with OfficePraser, OOXMLParser. Its not working. I have tried with tika 0.9 jar files. mimeTypes are coming correctly but if any one of my file attachment is "editable pdf" my batch process is dying (like "exit(0);" in code). If I have new tika jars its giving wrong mimeTypes.
Please help me in this. Thanks in advance.
CVSR Sarma