I'm trying to get the style information from an MS docx file, I have no problem writing file content with added styles like bold, italic. font size etc, but reading the file content and getting the style information is not so clear. I've tried using XWPFDocument, this API does not seem to have the ability to read the styles. I'm now trying XWPFWordExtractor which seems a bit more promising but I'm still stuck getting the style information for the text.
The type of content I reading looks similar to the following.
"Hello, this is bold text and this is italic text abd this is bold-italic text"
Any pointers to an example would be great.
Okay, so based on the comments from Gagravarr, the solution is below, exactly as I wanted. So basically Gagravarr answered the question but I'm not sure how apart from saying it hear to give him credit.
for (XWPFParagraph paragraph : docx.getParagraphs()) {
int pos = 0;
for (XWPFRun run : paragraph.getRuns()) {
System.out.println("Current run IsBold : " + run.isBold());
System.out.println("Current run IsItalic : " + run.isItalic());
for (char c : run.text().toCharArray()) {
System.out.print(c);
pos++;
}
System.out.println();
}
}
`
Output below
Current run IsBold : false
Current run IsItalic : false
"Hello, this is
Current run IsBold : true
Current run IsItalic : false
bold text
Current run IsBold : false
Current run IsItalic : false
and this is
Current run IsBold : false
Current run IsItalic : true
italic text
Current run IsBold : false
Current run IsItalic : false
a
Current run IsBold : false
Current run IsItalic : false
n
Current run IsBold : false
Current run IsItalic : false
d this is
Current run IsBold : true
Current run IsItalic : true
bold-italic text
Current run IsBold : false
Current run IsItalic : false
"
I gave up trying to use Apache poi, I found another lib called docx4j, this seems to do what I need, the properties I want to look at a now available, once the docx file is loaded you can view the content of the file in an xml format like below.
`
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:ns27="http://schemas.openxmlformats.org/schemaLibrary/2006/main" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" mc:Ignorable="w14 wp14">
<w:body>
<w:p w:rsidR="009A66AB" w:rsidRDefault="000F4AD1">
<w:r>
<w:rPr>
<w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
<w:color w:val="222222"/>
<w:sz w:val="23"/>
<w:szCs w:val="23"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
</w:rPr>
<w:t>"Hello, this is</w:t>
</w:r>
<w:r>
<w:rPr>
<w:rStyle w:val="apple-converted-space"/>
<w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
<w:color w:val="222222"/>
<w:sz w:val="23"/>
<w:szCs w:val="23"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
</w:rPr>
<w:t> </w:t>
</w:r>
<w:r>
<w:rPr>
<w:rStyle w:val="Strong"/>
<w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
<w:color w:val="222222"/>
<w:sz w:val="23"/>
<w:szCs w:val="23"/>
<w:bdr w:val="none" w:color="auto" w:sz="0" w:space="0" w:frame="true"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
</w:rPr>
<w:t>bold text</w:t>
</w:r>
<w:r>
<w:rPr>
<w:rStyle w:val="apple-converted-space"/>
<w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
<w:color w:val="222222"/>
<w:sz w:val="23"/>
<w:szCs w:val="23"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
</w:rPr>
<w:t> </w:t>
</w:r>
<w:r>
<w:rPr>
<w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
<w:color w:val="222222"/>
<w:sz w:val="23"/>
<w:szCs w:val="23"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
</w:rPr>
<w:t>and this is</w:t>
</w:r>
<w:r>
<w:rPr>
<w:rStyle w:val="apple-converted-space"/>
<w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
<w:color w:val="222222"/>
<w:sz w:val="23"/>
<w:szCs w:val="23"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
</w:rPr>
<w:t> </w:t>
</w:r>
<w:r>
<w:rPr>
<w:rStyle w:val="Emphasis"/>
<w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
<w:color w:val="222222"/>
<w:sz w:val="23"/>
<w:szCs w:val="23"/>
<w:bdr w:val="none" w:color="auto" w:sz="0" w:space="0" w:frame="true"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
</w:rPr>
<w:t>italic text</w:t>
</w:r>
<w:r>
<w:rPr>
<w:rStyle w:val="apple-converted-space"/>
<w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
<w:color w:val="222222"/>
<w:sz w:val="23"/>
<w:szCs w:val="23"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
</w:rPr>
<w:t> </w:t>
</w:r>
<w:r>
<w:rPr>
<w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
<w:color w:val="222222"/>
<w:sz w:val="23"/>
<w:szCs w:val="23"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
</w:rPr>
<w:t>an</w:t>
</w:r>
<w:r>
<w:rPr>
<w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
<w:color w:val="222222"/>
<w:sz w:val="23"/>
<w:szCs w:val="23"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
</w:rPr>
<w:t>d this is</w:t>
</w:r>
<w:r>
<w:rPr>
<w:rStyle w:val="apple-converted-space"/>
<w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
<w:color w:val="222222"/>
<w:sz w:val="23"/>
<w:szCs w:val="23"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
</w:rPr>
<w:t> </w:t>
</w:r>
<w:r>
<w:rPr>
<w:rStyle w:val="Emphasis"/>
<w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
<w:b/>
<w:bCs/>
<w:color w:val="222222"/>
<w:sz w:val="23"/>
<w:szCs w:val="23"/>
<w:bdr w:val="none" w:color="auto" w:sz="0" w:space="0" w:frame="true"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
</w:rPr>
<w:t>bold-italic text</w:t>
</w:r>
<w:r>
<w:rPr>
<w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
<w:color w:val="222222"/>
<w:sz w:val="23"/>
<w:szCs w:val="23"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
</w:rPr>
<w:t>"</w:t>
</w:r>
</w:p>
<w:sectPr w:rsidR="009A66AB">
<w:pgSz w:w="11906" w:h="16838"/>
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0"/>
<w:cols w:space="708"/>
<w:docGrid w:linePitch="360"/>
</w:sectPr>
</w:body>
</w:document>
`
you can use
paragraph.getCTP().getPPr().getRPr().isSetB()
I found a very nice way to copy styles from one document to another. It is not as direct as I would have hoped but it works.
- Rename the source word document to type zip
- Extract the contents
- Copy styles.xml into a string constant or read the file
Copy the styles into your output document with the following code
public void copyStylesXml(String stylesXmlString) {
try {
CTStyles ctStyle = CTStyles.Factory.parse(stylesXmlString);
XWPFStyles styles = getDoc().createStyles();
styles.setStyles(ctStyle);
} catch (Exception e) {
log.warn(e, e);
}
}
The same approach works for copying list formats
Here is a very good way to copy styles from another document. A little background; a docx file is really a zip file of a number of xml files including styles.xml. In the following code sample I read numberin.xml, parse it into a CTStyles object then set it in the current document. Here is most of the code. You can use the same approach to copy numbering.xml for your Word numbering.
// copy an existing style.xml document into this document to get styles
public void copyStylesFromDocument(String documentFileName) {
log.debug("fileName " + documentFileName);
try {
InputStream is = CertificationReportHelper.getInputStreamFromZipFile(documentFileName, FILE_NAME_STYLES);
CTStyles ctStyle = CTStyles.Factory.parse(is);
XWPFStyles styles = getDoc().createStyles();
styles.setStyles(ctStyle);
log.info("Styles copied from file " + FILE_NAME_STYLES + " in document" + documentFileName);
} catch (Exception e) {
String msg = "Error copying styles from file " + FILE_NAME_STYLES + " in document" + documentFileName;
addErrorMessage(msg, e);
log.debug(e, e);
}
@SuppressWarnings("resource") // closing stream causes input stream to close and operation fails
public static InputStream getInputStreamFromZipFile(String zipFileName, String containedFile) {
InputStream is = null;
ZipFile zfile = null;
try {
zfile = new ZipFile(zipFileName);
ZipEntry entry = zfile.getEntry(containedFile);
log.trace(entry);
if (entry != null) {
is = zfile.getInputStream(entry);
log.trace("created input stream for file " + containedFile + " from zip file" + zipFileName);
} else {
String msg = "Error getting input stream for file " + containedFile + " from zip file " + zipFileName;
// closing stream causes input stream to close and operation fails
throw new ApplicationRuntimeException(msg);
}
} catch (Exception e) {
String msg = "Error getting input stream for file " + containedFile + " from zip file " + zipFileName + " Message:"
+ e.getMessage();
log.warn("*** Throwing exception " + msg);
throw new ApplicationRuntimeException(msg, e);
} finally {
// closing stream causes input stream to close and operation fails
// try {
// zfile.close();
// } catch (IOException e) {
// log.warn("Catching exception "+e+" closing zip file "+zipFileName);
// }
}
return is;