I'm trying to convert html text to generate a word table. It works pretty well, and the created word file is correct, except the character styles.
This is my first try with Apache POI.
So far, I was able to detect new line (<br>) tags from text paragraph (see code below). But I'd like to also check a few other tags such as <b>, <li>, <font> and set the right run values for each part.
For example :
This is my text <i> which now is in italic<b> but also in bold</b> depending on its importance</i>
I gess I should parse the text, and apply different runs for each part, but I don't know how to do.
private static XWPFParagraph getTableParagraph(XWPFTableCell cell, String text)
{
int fontsize= 11;
XWPFParagraph paragraph = cell.addParagraph();
cell.removeParagraph(0);
paragraph.setSpacingAfterLines(0);
paragraph.setSpacingAfter(0);
XWPFRun myRun1 = paragraph.createRun();
if (text==null) text="";
else
{
while (true)
{
int x = text.indexOf("<br>");
if (x <0) break;
String work = text.substring(0,x );
text= text.substring(x+4);
myRun1.setText(work);
myRun1.addBreak();
}
}
myRun1.setText(text);
myRun1.setFontSize(fontsize);
return paragraph;
}
While converting HTML text one never should go on the
HTML
using string methods only.XML
as well asHTML
are markup languages. Their content is markup and not only plain text. The markup needs to be traversed to get all the single nodes together with the meanings out of it. This traversing process never is trivial and so special libraries are there for. Deep inside those libraries also needs using string methods but those are wrapped into useful methods for traversing the markup.For traversing
HTML
jsoup may be used for example. Especially NodeTraversor using NodeVisitor is useful for traversingHTML
.My example creates a
ParagraphNodeVisitor
which implementsNodeVisitor
. This interface requests methodpublic void head(Node node, int depth)
which is called every time theNodeTraversor
is on head of a node andpublic void tail(Node node, int depth)
which is called every time theNodeTraversor
is on tail of a node. In those methods the process for handling the single nodes can be implemented. In our case main part of the process is whether we need a newXWPFRun
and what settings this run needs.Example:
Result:
Disclaimer: This is a working draft showing the principle. Neither it is fully ready nor it is code ready for use in productive environments.