docx4j find and replace

2019-05-21 12:44发布

问题:

I have docx document with some placeholders. Now I should replace them with other content and save new docx document. I started with docx4j and found this method:

public static List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
    List<Object> result = new ArrayList<Object>();
    if (obj instanceof JAXBElement) obj = ((JAXBElement<?>) obj).getValue();

    if (obj.getClass().equals(toSearch))
        result.add(obj);
    else if (obj instanceof ContentAccessor) {
        List<?> children = ((ContentAccessor) obj).getContent();
        for (Object child : children) {
            result.addAll(getAllElementFromObject(child, toSearch));
        }
    }
    return result;
}

public static void findAndReplace(WordprocessingMLPackage doc, String toFind, String replacer){
    List<Object> paragraphs = getAllElementFromObject(doc.getMainDocumentPart(), P.class);
    for(Object par : paragraphs){
        P p = (P) par;
        List<Object> texts = getAllElementFromObject(p, Text.class);
        for(Object text : texts){
            Text t = (Text)text;
            if(t.getValue().contains(toFind)){
                t.setValue(t.getValue().replace(toFind, replacer));
            }
        }
    }
}

But that only work rarely because usually the placeholders splits across multiple texts runs.

I tried UnmarshallFromTemplate but it work rarely too.

How this problem could be solved?

回答1:

You can use VariableReplace to acheive this which may not have existed at the time of the other answers. This does not do a find/replace per se but works on placeholders eg $(myField)

java.util.HashMap mappings = new java.util.HashMap();
VariablePrepare.prepare(wordMLPackage);//see notes
mappings.put("myField", "foo");
wordMLPackage.getMainDocumentPart().variableReplace(mappings);

Note that you do not pass $(myField) as the field name; rather pass the unescaped field name myField - This is rather inflexible in that as it currently stands your placeholders must be of the format $(xyz) whereas if you could pass in anything then you could use it for any find/replace. The ability to use this also exists for C# people in docx4j.NET

See here for more info on VariableReplace or here for VariablePrepare



回答2:

Good day, I made an example how to quickly replace text to something you need by regexp. I find ${param.sumname} and replace it in document. Note, you have to insert text as 'text only'! Have fun!

  WordprocessingMLPackage mlp = WordprocessingMLPackage.load(new File("filepath"));
  replaceText(mlp.getMainDocumentPart());

  static void replaceText(ContentAccessor c)
    throws Exception
  {
    for (Object p: c.getContent())
    {
      if (p instanceof ContentAccessor)
        replaceText((ContentAccessor) p);

      else if (p instanceof JAXBElement)
      {
        Object v = ((JAXBElement) p).getValue();

        if (v instanceof ContentAccessor)
          replaceText((ContentAccessor) v);

        else if (v instanceof org.docx4j.wml.Text)
        {
          org.docx4j.wml.Text t = (org.docx4j.wml.Text) v;
          String text = t.getValue();

          if (text != null)
          {
            t.setSpace("preserve"); // needed?
            t.setValue(replaceParams(text));
          }
        }
      }
    }
  }

  static Pattern paramPatern = Pattern.compile("(?i)(\\$\\{([\\w\\.]+)\\})");

  static String replaceParams(String text)
  {
    Matcher m = paramPatern.matcher(text);

    if (!m.find())
      return text;

    StringBuffer sb = new StringBuffer();
    String param, replacement;

    do
    {
      param = m.group(2);

      if (param != null)
      {
        replacement = getParamValue(param);
        m.appendReplacement(sb, replacement);
      }
      else
        m.appendReplacement(sb, "");
    }
    while (m.find());

    m.appendTail(sb);
    return sb.toString();
  }

  static String getParamValue(String name)
  {
    // replace from map or something else
    return name;
  }


回答3:

This can be a problem. I cover how to mitigate broken-up text runs in this answer here: https://stackoverflow.com/a/17066582/125750

... but you might want to consider content controls instead. The docx4j source site has various content control samples here:

https://github.com/plutext/docx4j/tree/master/src/samples/docx4j/org/docx4j/samples



标签: java docx docx4j