I just found Apache POI library very useful for editing Word files using Java. Specifically, I want to edit a DOCX file using Apache POI's XWPF classes. I found no proper method / documentation following which I could do this. Can somebody please explain in steps, how to replace some text in a DOCX file.
** The text may be in a line / paragraph or in a table row/column
Thanks in Advance :)
The answer accepted here needs one more update along with Justin Skiles update. r.setText(text, 0); Reason: If not updating setText with pos variable, the output will be the combination of old string and replace string.
I suggest my solution for replacing text between #, for example: This #bookmark# should be replaced. It is replace in:
Also, it takes into account situations, when symbol # and bookmark are in the separated runs (replace variable between different runs).
Here link to the code: https://gist.github.com/aerobium/bf02e443c079c5caec7568e167849dda
The method you need is XWPFRun.setText(String). Simply work your way through the file until you find the XWPFRun of interest, work out what you want the new text to be, and replace it. (A run is a sequence of text with the same formatting)
You should be able to do something like:
There is the
replaceParagraph
implementation that replaces${key}
withvalue
(thefieldsForReport
parameter) and saves format by mergingruns
contents${key}
.Implementation replaceParagraph
Unit test
run.getText(int position) - from documentation: Returns: the text of this text run or null if not set
Just check if it is not null before calling contains() on it
And btw if you want to replace the text you need to set it in position from which you get it, in this case r.setText(text, 0);. Otherwise text will be added not replaced
Here is what we did for text replacement using Apache POI. We found that it was not worth the hassle and simpler to replace the text of an entire XWPFParagraph instead of a run. A run can be randomly split in the middle of a word as Microsoft Word is in charge of where runs are created within the paragraph of a document. Therefore the text you might be searching for could be half in one run and half in another. Using the full text of a paragraph, removing its existing runs, and adding a new run with the adjusted text seems to solve the problem of text replacement.
However there is a cost of doing the replacement at the paragraph level; you lose the formatting of the runs in that paragraph. For example if in the middle of your paragraph you had bolded the word "bits", and then when parsing the file you replaced the word "bits" with "bytes", the word "bytes" would no longer be bolded. Because the bolding was stored with a run that was removed when the paragraph's entire body of text was replaced. The attached code has a commented out section that was working for replacement of text at the run level if you need it.
It should also be noted that the below works if the text you are inserting contains \n return characters. We could not find a way to insert returns without creating a run for each section prior to the return and marking the run addCarriageReturn(). Cheers