In MS-Word 2010 there is an Option under File -> Information to check the document for problems before sharing it. This makes it possible to handle track changes (to new newest version) and remove all comments and annotations from the document at once.
Is this possibility available in docx4j as well or do I need to investiagte the corresponding JAXB-Objects and write a traverse finder?
Doing that manually could be a lot of work since I would have to add the RunIns
(w:ins
) to the R
(w:r
) and remove the RunDel
(w:del
). I also saw a w:del
once inside a w:ins
. In this case I don't know if this also appears vice versa or in deeper nestings.
Further research brought this XSLT up: https://github.com/plutext/docx4all/blob/master/docx4all/src/main/java/org/docx4all/util/ApplyRemoteChanges.xslt I was not able to run this within docx4j but by manually unzipping the docx and extracting the document.xml. After applying the xslt on the plain document.xml I wrapped it in the docx container again to open it with MS-Word. The result was not the same as it would be by accepting the revision with MS-Word itself. More concrete: The XSLT removed the deleted marked text (in a Table), but not a listing dot before the text. This appears quite often in my document.
If this request is not posible to solve in an easy manner, I will change the constraints. It is sufficent for me to have a method for getting all text of a ContentAccessor, as a String
. The ContentAccessor could be a P
or Tc
. The String shall be inside a R
there or inside a RunIns
(with R
inside of that) For this I have a half solution below. The intersting part starts in the line of else if (child instanceof RunIns) {
. But as mentioned above I'm not sure how nested del/ins Statements might appear and if this will handle them well. And the results are still not the same as if I would prepare the document with MS-Word before.
//Similar to:
//http://www.docx4java.org/forums/docx-java-f6/how-to-get-all-text-element-of-a-paragraph-with-docx4j-t2028.html
private String getAllTextfromParagraph(ContentAccessor ca) {
String result = "";
List<Object> children = ca.getContent();
for (Object child : children) {
child = XmlUtils.unwrap(child);
if (child instanceof Text) {
Text text = (Text) child;
result += text.getValue();
} else if (child instanceof R) {
R run = (R) child;
result += getTextFromRun(run);
}
else if (child instanceof RunIns) {
RunIns ins = (RunIns) child;
for (Object obj : ins.getCustomXmlOrSmartTagOrSdt()) {
if (obj instanceof R) {
result += getTextFromRun((R) obj);
}
}
}
}
return result.trim();
}
private String getTextFromRun(R run) {
String result = "";
for (Object o : run.getContent()) {
o = XmlUtils.unwrap(o);
if (o instanceof R.Tab) {
Text text = new Text();
text.setValue("\t");
result += text.getValue();
}
if (o instanceof R.SoftHyphen) {
Text text = new Text();
text.setValue("\u00AD");
result += text.getValue();
}
if (o instanceof Br) {
Text text = new Text();
text.setValue(" ");
result += text.getValue();
}
if (o instanceof Text) {
result += ((Text) o).getValue();
}
}
return result;
}