I 'm trying to replace a template DOCX
document with Apache POI
by using the XWPFDocument
class. I have tags in the doc and a JSON
file to read the replacement data. My problem is that a text line seems separated in a certain way in DOCX
when I change its extension to ZIP
file and open document.xml
. For example [MEMBER_CONTACT_INFO]
text becomes [MEMBER_CONTACT_INFO
and ]
separately. POI
reads this in the same way since the DOCX
original is like this. This creates 2 XWPFRun
objects in the paragraph which show the text as [MEMBER_CONTACT_INFO
and ]
separately.
My question is, is there a way to force POI
to run like Word via merging related runs or something like that? Or how can I solve this problem? I 'm matching run texts while replacing and I can't find my tag because it is split into 2 different run object.
Best
This wasted so much of my time once...
Basically, an XWPFParagraph
is composed of multiple XWPFRun
s, and XWPFRun is a contagious text that has a fixed same style.
So when you try writing something like "[PLACEHOLDER_NAME]" in MS-Word it will create a single XWPFRun. But if you somehow add a few things more, and then you go back and change "[PLACEHOLDER_NAME]" to something else it is never guaranteed that it will remain a single XWPFRun
it is quite possible that it will split to two Runs. AFAIK this is how MS-Word works.
How to avoid splitting of Runs in such cases?
Solution: There are two solutions that I know of:
Copy text "[PLACEHOLDER_NAME]" to Notepad or something. Make your necessary modification and copy it back and paste it instead of "[PLACEHOLDER_NAME]" in your word file, this way your whole "[PLACEHOLDER_NAME]" will be replaced with new text avoiding splitting of XWPFRuns.
Select "[PLACEHOLDER_NAME]" and then click of MS-Word "Replace" option and Replace with "[Your-new-edited-placeholder]" and this will guarantee that your new placeholder will consume a single XWPFRun.
If you have to change your new placeholder again, follow step 1 or 2.
I also had this issue few days ago and I couldn't find any solution. I chose to use PLACEHOLDER_NAME instead of [PLACEHOLDER_NAME]. This is working fine for me and it's seen like a single XWPFRun object.
Here is the java code to fix that separate text line issue. It will also handle the mult-format string replacement.
public static void replaceString(XWPFDocument doc, String search, String replace) throws Exception{
for (XWPFParagraph p : doc.getParagraphs()) {
List<XWPFRun> runs = p.getRuns();
List<Integer> group = new ArrayList<Integer>();
if (runs != null) {
String groupText = search;
for (int i=0 ; i<runs.size(); i++) {
XWPFRun r = runs.get(i);
String text = r.getText(0);
if (text != null)
if(text.contains(search)) {
String safeToUseInReplaceAllString = Pattern.quote(search);
text = text.replaceAll(safeToUseInReplaceAllString, replace);
r.setText(text, 0);
}
else if(groupText.startsWith(text)){
group.add(i);
groupText = groupText.substring(text.length());
if(groupText.isEmpty()){
runs.get(group.get(0)).setText(replace, 0);
for(int j = 1; j<group.size(); j++){
p.removeRun(group.get(j));
}
group.clear();
groupText = search;
}
}else{
group.clear();
groupText = search;
}
}
}
}
for (XWPFTable tbl : doc.getTables()) {
for (XWPFTableRow row : tbl.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph p : cell.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (text.contains(search)) {
String safeToUseInReplaceAllString = Pattern.quote(search);
text = text.replaceAll(safeToUseInReplaceAllString, replace);
r.setText(text);
}
}
}
}
}
}
}