I am trying to remove a set of contiguous paragraphs from a Microsoft Word document, using Apache POI
.
From what I have understood, deleting a paragraph is possible by removing all of its runs, this way:
/*
* Deletes the given paragraph.
*/
public static void deleteParagraph(XWPFParagraph p) {
if (p != null) {
List<XWPFRun> runs = p.getRuns();
//Delete all the runs
for (int i = runs.size() - 1; i >= 0; i--) {
p.removeRun(i);
}
p.setPageBreak(false); //Remove the eventual page break
}
}
In fact, it works, but there's something strange. The block of removed paragraphs does not disappear from the document, but it's converted in a set of empty lines. It's just like every paragraph would be converted into a new line.
By printing the paragraphs' content from code I can see, in fact, a space (for each one removed). Looking at the content directly from the document, with the formatting mark's visualization enabled, I can see this:
The vertical column of ¶ corresponds to the block of deleted elements.
Do you have an idea for that? I'd like my paragraphs to be completely removed.
I also tried by replacing the text (with setText()
) and by removing eventual spaces that could be added automatically, this way:
p.setSpacingAfter(0);
p.setSpacingAfterLines(0);
p.setSpacingBefore(0);
p.setSpacingBeforeLines(0);
p.setIndentFromLeft(0);
p.setIndentFromRight(0);
p.setIndentationFirstLine(0);
p.setIndentationLeft(0);
p.setIndentationRight(0);
But with no luck.
I would delete paragraphs by deleting paragraphs, not by deleting only the runs in this paragraphs. Deleting paragraphs is not part of the
apache poi
high level API. But usingXWPFDocument.getDocument().getBody()
we can get the low levelCTBody
and there is aremoveP(int i)
.Example:
This deletes all paragraphs from the document source.docx where the text contains "delete" and saves the result in result.docx.
Edited:
Although
doc.getDocument().getBody().removeP(pPos);
works, it will not update theXWPFDocument
's paragraphs list. So it will destroy paragraph iterators and other accesses to that list since the list is only updated while reading the document again.So the better approach is using
doc.removeBodyElement(pPos);
instead. removeBodyElement(int pos) does exactly the same asdoc.getDocument().getBody().removeP(pos);
if thepos
is pointing to a pagagraph in the document body since that paragraph is anBodyElement
too. But in addition, it will update theXWPFDocument
's paragraphs list.When you are inside of a table you need to use the functions of the
XWPFTableCell
instead of theXWPFDocument
: