get thumbnail of word in java using Apache POI

2019-09-02 19:34发布

问题:

I study on a web sharing project in jsf.In this project users can upload documents such as .doc,.pdf,.ppt,..etc . I want show this documents first pages as a thumbnail. After some googling around I found Apache POI.Can anybody has any suggestion for my problem? How can I return thumbnail image of word doc's first page? I try this code.This code just get first picture that word doc contains:

        POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream("d:\\test.doc"));
        HWPFDocument doc = new HWPFDocument(fs);
        PicturesTable pt=doc.getPicturesTable();
        List<Picture> p=pt.getAllPictures();
        BufferedImage image=ImageIO.read(new ByteArrayInputStream(p.get(0).getContent()));
        ImageIO.write(image, "JPG", new File("d:\\test.jpg"));

回答1:

What's you are doing make nothing. HWPFDocument can extract thumbnail embedded in document (when saving files, check on 'add preview' option). So HWPFDocument can extract only thumbnail of documents having thumbnail.

Even, to do that, you have to make: {code}

static byte[] process(File docFile) throws Exception {
    final HWPFDocumentCore wordDocument = AbstractWordUtils.loadDoc(docFile);
    SummaryInformation summaryInformation = wordDocument.getSummaryInformation();
    System.out.println(summaryInformation.getAuthor());
    System.out.println(summaryInformation.getApplicationName() + ":" + summaryInformation.getTitle());
    Thumbnail thumbnail = new Thumbnail(summaryInformation.getThumbnail());
    System.out.println(thumbnail.getClipboardFormat());
    System.out.println(thumbnail.getClipboardFormatTag());
    return thumbnail.getThumbnailAsWMF();
}

{code} after that, you have to probably convert WMF file format to more common format (jpeg, png...). ImageMagick can help.