I need to remove some content from an existing pdf created with Jasper Reports in iText 5.5.11 but after running PdfCleanUpProcessor all bold text is blurry.
This is the code I'm using:
PdfReader reader = new PdfReader("input.pdf");
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("output.pdf"));
List<PdfCleanUpLocation> cleanUpLocations = new ArrayList<PdfCleanUpLocation>();
cleanUpLocations.add(new PdfCleanUpLocation(1, new Rectangle(0f, 0f, 595f, 680f)));
PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.cleanUp();
stamper.close();
reader.close();
As already discussed here downgrading to itext-5.5.4 solves the problem, but in my case itext-5.5.11 is already in use for other reasons and so downgrading is not an option.
Is there another solution or workaround?
This are the pdf files before and after cleaning: BEFORE - AFTER
By comparing the before and after files it becomes clear that for some reason the PdfCleanUpProcessor
falsely drops general graphics state operations (at least w, J, and d).
In your before document in particular the w operation is important for the text because a poor man's bold variant is used, i.e. instead of using an actual bold font the normal font is used and the text rendering mode is set to not only fill the glyph contours but also draw a line along it giving it a bold'ish appearance.
The width of that line is set to 0.23333 using a w operation. As that operation is missing in the after document, the default width value of 1 is used. Thus, the line along the contour now is 4 times as big as before resulting in a very fat appearance.
This issue has been introduced in commit d5abd23 (dated May 4th, 2015) which (among other things) added this block to PdfCleanUpContentOperator.invoke
:
} else if (lineStyleOperators.contains(operatorStr)) {
if ("w" == operatorStr) {
cleanUpStrategy.getContext().setLineWidth(((PdfNumber) operands.get(0)).floatValue());
} else if ("J" == operatorStr) {
cleanUpStrategy.getContext().setLineCapStyle(((PdfNumber) operands.get(0)).intValue());
} else if ("j" == operatorStr) {
cleanUpStrategy.getContext().setLineJoinStyle(((PdfNumber) operands.get(0)).intValue());
} else if ("M" == operatorStr) {
cleanUpStrategy.getContext().setMiterLimit(((PdfNumber) operands.get(0)).floatValue());
} else if ("d" == operatorStr) {
cleanUpStrategy.getContext().setLineDashPattern(new LineDashPattern(((PdfArray) operands.get(0)),
((PdfNumber) operands.get(1)).floatValue()));
}
disableOutput = true;
This causes all lineStyleOperators
to be dropped while at the same time an attempt was made to store the changed values in the cleanup strategy context. But of course using ==
for String
comparisons in Java usually is a very bad idea, so since this version the line style operators were dropped for good in iText.
Actually this code had been ported from iTextSharp, and in C# ==
for the string
type works entirely different; nonetheless, even in the iTextSharp version these stored values at first glance only seem to have been taken into account if paths were stroked, not if text rendering included stroking along the contour.
Later on in commit 9967627 (on the same day as the commit above) the inner if..else if..else..
has been removed with the comment Replaced PdfCleanUpGraphicsState
with existing GraphicsState
from itext.pdf.parser
package, added missing parameters into the latter, only the disableOutput = true
remained. This (also at first glance) appears to have fixed the difference between iText/Java and iTextSharp/.Net, but the line style values still are not considered if text rendering included stroking along the contour.
As a work-around consider removing the lines
} else if (lineStyleOperators.contains(operatorStr)) {
disableOutput = true;
from PdfCleanUpContentOperator.invoke
. Now the line style operators are not dropped anymore and the text in your PDF after redaction looks like before. I have not checked for any side effects, though, so please test with a number of documents before even considering using that work-around in production.