I'm working on my first project using docx4j... My goal is to export xhtml from a webapp (ckeditor created html) into a docx, edit it in Word, then import it back into the ckeditor wysiwyg.
(*crosspost from http://www.docx4java.org/forums/xhtml-import-f28/html-docx-html-inserts-a-lot-of-space-t1966.html#p6791?sid=78b64a02482926c4dbdbafbf50d0a914 will update when answered)
I have created an html test document with the following contents:
<html><ul><li>TEST LINE 1</li><li>TEST LINE 2</li></ul></html>
My code then creates a docx from this html like so: WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage .createPackage();
NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
ndp.unmarshalDefaultNumbering();
XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
xHTMLImporter.setHyperlinkStyle("Hyperlink");
wordMLPackage.getMainDocumentPart().getContent()
.addAll(xHTMLImporter.convert(new File("test.html"), null));
System.out.println(XmlUtils.marshaltoString(wordMLPackage
.getMainDocumentPart().getJaxbElement(), true, true));
wordMLPackage.save(new java.io.File("test.docx"));
My code then attempts to convert the docx BACK to html like so: WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage .createPackage();
NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
ndp.unmarshalDefaultNumbering();
XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
xHTMLImporter.setHyperlinkStyle("Hyperlink");
WordprocessingMLPackage docx = WordprocessingMLPackage.load(new File("test.docx"));
AbstractHtmlExporter exporter = new HtmlExporterNG2();
OutputStream os = new java.io.FileOutputStream("test.html");
HTMLSettings htmlSettings = new HTMLSettings();
javax.xml.transform.stream.StreamResult result = new javax.xml.transform.stream.StreamResult(
os);
exporter.html(docx, result, htmlSettings);
The html returned is:
<?xml version="1.0" encoding="UTF-8"?><html xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<style>
<!--/*paged media */ div.header {display: none }div.footer {display: none } /*@media print { */@page { size: A4; margin: 10%; @top-center {content: element(header) } @bottom-center {content: element(footer) } }/*element styles*/ .del {text-decoration:line-through;color:red;} .ins {text-decoration:none;background:#c0ffc0;padding:1px;}
/* TABLE STYLES */
/* PARAGRAPH STYLES */
.DocDefaults {display:block;margin-bottom: 4mm;line-height: 115%;font-size: 11.0pt;}
.Normal {display:block;}
/* CHARACTER STYLES */ span.DefaultParagraphFont {display:inline;}
-->
</style>
<script type="text/javascript">
<!--function toggleDiv(divid){if(document.getElementById(divid).style.display == 'none'){document.getElementById(divid).style.display = 'block';}else{document.getElementById(divid).style.display = 'none';}}
--></script>
</head>
<body>
<!-- userBodyTop goes here -->
<div class="document">
<p class="Normal DocDefaults " style="text-align: left;position: relative; margin-left: 17mm;text-indent: -0.25in;margin-bottom: 0in;">• <span class="DefaultParagraphFont " style="font-weight: normal;color: #000000;font-style: normal;font-size: 11.0pt;">TEST LINE 1</span>
</p>
<p class="Normal DocDefaults " style="text-align: left;position: relative; margin-left: 17mm;text-indent: -0.25in;margin-bottom: 0in;">• <span class="DefaultParagraphFont " style="font-weight: normal;color: #000000;font-style: normal;font-size: 11.0pt;">TEST LINE 2</span>
</p>
</div>
<!-- userBodyTail goes here -->
</body>
</html>
There is a lot of extra space created after each line now. Not sure why this is happening, the conversion appears to add a lot of extra white space/carriage returns.