using WordToHtmlConverter converter in Apache POI

2019-09-08 06:37发布

I am trying to use WordToHtmlConverter class to convert a word document in HTML, but the documentation is not clear.

The WordToHtmlConverter has a constructor taking org.w3c.dom.Document, but I don't think it is the word document.

Does anyone have a sample program on how to load a word document and convert it into html.

1条回答
欢心
2楼-- · 2019-09-08 07:18

You best bet for now is probably to look at the unit tests, eg TestWordToHtmlConverter. That will show you how to do it

In general though, you pass in the xml document to be populated, have WordToHtmlConverter generate the HTML into it from the Word document, then transform the xml document into appropriate output (indenting, new lines etc)

Your code would want to look something like:

    Document newDocument = DocumentBuilderFactory.newInstance()
            .newDocumentBuilder().newDocument();
    WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
            newDocument );

    wordToHtmlConverter.processDocument( hwpfDocument );

    StringWriter stringWriter = new StringWriter();
    Transformer transformer = TransformerFactory.newInstance()
            .newTransformer();
    transformer.setOutputProperty( OutputKeys.INDENT, "yes" );
    transformer.setOutputProperty( OutputKeys.ENCODING, "utf-8" );
    transformer.setOutputProperty( OutputKeys.METHOD, "html" );
    transformer.transform(
            new DOMSource( wordToHtmlConverter.getDocument() ),
            new StreamResult( stringWriter ) );

    String html = stringWriter.toString();
查看更多
登录 后发表回答