Seperated text line in Apache POI XWPFRun object

2019-03-01 05:58发布

问题:

I 'm trying to replace a template DOCX document with Apache POI by using the XWPFDocument class. I have tags in the doc and a JSON file to read the replacement data. My problem is that a text line seems separated in a certain way in DOCX when I change its extension to ZIP file and open document.xml. For example [MEMBER_CONTACT_INFO] text becomes [MEMBER_CONTACT_INFO and ] separately. POI reads this in the same way since the DOCX original is like this. This creates 2 XWPFRun objects in the paragraph which show the text as [MEMBER_CONTACT_INFO and ] separately.

My question is, is there a way to force POI to run like Word via merging related runs or something like that? Or how can I solve this problem? I 'm matching run texts while replacing and I can't find my tag because it is split into 2 different run object.

Best

回答1:

This wasted so much of my time once...

Basically, an XWPFParagraph is composed of multiple XWPFRuns, and XWPFRun is a contagious text that has a fixed same style.

So when you try writing something like "[PLACEHOLDER_NAME]" in MS-Word it will create a single XWPFRun. But if you somehow add a few things more, and then you go back and change "[PLACEHOLDER_NAME]" to something else it is never guaranteed that it will remain a single XWPFRun it is quite possible that it will split to two Runs. AFAIK this is how MS-Word works.

How to avoid splitting of Runs in such cases?

Solution: There are two solutions that I know of:

  1. Copy text "[PLACEHOLDER_NAME]" to Notepad or something. Make your necessary modification and copy it back and paste it instead of "[PLACEHOLDER_NAME]" in your word file, this way your whole "[PLACEHOLDER_NAME]" will be replaced with new text avoiding splitting of XWPFRuns.

  2. Select "[PLACEHOLDER_NAME]" and then click of MS-Word "Replace" option and Replace with "[Your-new-edited-placeholder]" and this will guarantee that your new placeholder will consume a single XWPFRun.

If you have to change your new placeholder again, follow step 1 or 2.



回答2:

I also had this issue few days ago and I couldn't find any solution. I chose to use PLACEHOLDER_NAME instead of [PLACEHOLDER_NAME]. This is working fine for me and it's seen like a single XWPFRun object.



回答3:

Here is the java code to fix that separate text line issue. It will also handle the mult-format string replacement.

public static void replaceString(XWPFDocument doc, String search, String replace) throws Exception{
  for (XWPFParagraph p : doc.getParagraphs()) {
    List<XWPFRun> runs = p.getRuns();
    List<Integer> group = new ArrayList<Integer>();
    if (runs != null) {
      String groupText = search;
      for (int i=0 ; i<runs.size(); i++) {
        XWPFRun r = runs.get(i);
        String text = r.getText(0);
        if (text != null)
            if(text.contains(search)) {
              String safeToUseInReplaceAllString = Pattern.quote(search);
              text = text.replaceAll(safeToUseInReplaceAllString, replace);
              r.setText(text, 0);
            }
            else if(groupText.startsWith(text)){
              group.add(i);
              groupText = groupText.substring(text.length());
              if(groupText.isEmpty()){
                runs.get(group.get(0)).setText(replace, 0);
                for(int j = 1; j<group.size(); j++){
                  p.removeRun(group.get(j));
                }
                group.clear();
                groupText = search;
              }
            }else{
              group.clear();
              groupText = search;
            }
        }
    }
}
for (XWPFTable tbl : doc.getTables()) {
   for (XWPFTableRow row : tbl.getRows()) {
      for (XWPFTableCell cell : row.getTableCells()) {
         for (XWPFParagraph p : cell.getParagraphs()) {
            for (XWPFRun r : p.getRuns()) {
              String text = r.getText(0);
              if (text.contains(search)) {
                String safeToUseInReplaceAllString = Pattern.quote(search);
                text = text.replaceAll(safeToUseInReplaceAllString, replace);
                r.setText(text);
              }
            }
         }
      }
   }
}

}



标签: apache-poi