How to read doc file using Poi?

2019-09-21 04:39发布

问题:

I am trying to view word file in my editor pane I tried these lines

import java.awt.Dimension;
import java.awt.GridLayout;
import java.io.File;
import java.io.FileInputStream;
import javax.swing.JEditorPane;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

public class editorpane extends JEditorPane
{
public editorpane(File file)
{

    try
    {
        FileInputStream fis = new FileInputStream(file.getAbsolutePath());
        HWPFDocument hwpfd = new HWPFDocument(fis);
        WordExtractor we = new WordExtractor(hwpfd);
        String[] array = we.getParagraphText();
        for (int i = 0; i < array.length; i++)
        {
            this.setPage(array[i]);
        }

    } catch (Exception e)
    {
        e.printStackTrace();
    }

but gives me

org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:106)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174)
at frame1.editorpane.<init>(editorpane.java:24)

in this line

HWPFDocument hwpfd = new HWPFDocument(fis);

how can I solve that ??

beside I am not sure about these lines

for (int i = 0; i < array.length; i++)
        {
            this.setPage(array[i]);
        }

can I get them confirmed ??

回答1:

You are trying to open a .docx file (XWPF) with code for .doc (HWPF) files. You can use XWPFWordExtractor for .docx files.

There is an ExtractorFactory which you can use to let POI decide which of these applies and uses the correct class to open the file, however you can then not iterate by page as only a generic getText() method is available then.

Use it like this

POITextExtractor extractor = ExtractorFactory.createExtractor(file);
extractor.getText();