Traverse whole PDF and change blue color to black

2020-03-30 07:32发布

问题:

I am using below code to remove blue colors from pdf text. It is working fine. But it is not changing underlines color, but changing text color correctly.

original file part:

Manipulated File:

As you see in above manipulated file, underline color didn't change.

I am looking fix for this thing since two weeks, can anyone help on this. Below is my change color code:

public void testChangeBlackTextToGreenDocument(String source, String filename) throws IOException {
    try (InputStream resource = getClass().getResourceAsStream(source);
            PdfReader pdfReader = new PdfReader(source);
            OutputStream result = new FileOutputStream(filename);
            PdfWriter pdfWriter = new PdfWriter(result);
            PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter);) {
        PdfCanvasEditor editor = new PdfCanvasEditor() {

            @Override
            protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands) {

                String operatorString = operator.toString();

                if (TEXT_SHOWING_OPERATORS.contains(operatorString)) {
                    List<PdfObject> listobj = new ArrayList<>();
                    listobj.add(new PdfNumber(0));
                    listobj.add(new PdfNumber(0));
                    listobj.add(new PdfNumber(0));
                    listobj.add(new PdfLiteral("rg"));
                    if (currentlyReplacedBlack == null) {
                        Color currentFillColor =getGraphicsState().getFillColor();
                        if (ColorConstants.GREEN.equals(currentFillColor) || ColorConstants.CYAN.equals(currentFillColor) || ColorConstants.BLUE.equals(currentFillColor)) {
                            currentlyReplacedBlack = currentFillColor;
                            super.write(processor, new PdfLiteral("rg"), listobj);
                        }
                    }
                } else if (currentlyReplacedBlack != null) {
                    if (currentlyReplacedBlack instanceof DeviceCmyk) {
                        List<PdfObject> listobj = new ArrayList<>();
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfLiteral("k"));
                        super.write(processor, new PdfLiteral("k"), listobj);
                    } else if (currentlyReplacedBlack instanceof DeviceGray) {
                        List<PdfObject> listobj = new ArrayList<>();
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfLiteral("g"));
                        super.write(processor, new PdfLiteral("g"), listobj);
                    } else {
                        List<PdfObject> listobj = new ArrayList<>();
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfLiteral("rg"));
                        super.write(processor, new PdfLiteral("rg"), listobj);
                    }
                    currentlyReplacedBlack = null;
                }

                super.write(processor, operator, operands);
            }

            Color currentlyReplacedBlack = null;

            final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
        };
        for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++) {
            editor.editPage(pdfDocument, i);
        }
    }
    File file = new File(source);
    file.delete();
}

Here is the original file. https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/originalFile.pdf

Related Links:

Traverse whole PDF and change some attribute with some object in it using iText

Removing Watermark from PDF iTextSharp

Maven Dependcy Details:

        <dependency>
        <groupId>com.itextpdf</groupId>
        <artifactId>itext7-core</artifactId>
        <version>7.1.5</version>
        <type>pom</type>
    </dependency>

    <dependency>
        <groupId>com.itextpdf</groupId>
        <artifactId>itextpdf</artifactId>
        <version>5.0.6</version>
    </dependency>

Edited:

Accepted answer is not working for below files:

https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/021549Orig1s025_aprepitant_clinpharm_prea_Mac.pdf (Page 41)

https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/400_206494S5_avibactam_and_ceftazidine_unireview_prea_Mac.pdf (Page 60).

Please Help.

回答1:

(The example code here uses iText 7 for Java. You mentioned neither the iText version nor your programming environment in tags or question text but your example code appears to indicate that this is your combination of choice.)

Replacing blue fill colors

The test you based your original code on attempts explicitly only to change text color. The "underline" in your document, though, is (as far as PDF drawing is concerned) not part of the text but instead drawn as a simple path. Thus, the underline explicitly is not touched by the original code and it has to be adapted for your task.

But actually your task, changing everything blue to black, is easier to implement than only changing the blue text, e.g.

try (   PdfReader pdfReader = new PdfReader(SOURCE_PDF);
        PdfWriter pdfWriter = new PdfWriter(RESULT_PDF);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfCanvasEditor()
    {
        @Override
        protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            String operatorString = operator.toString();

            if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
                if (isApproximatelyEqual(operands.get(0), 0) &&
                        isApproximatelyEqual(operands.get(1), 0) &&
                        isApproximatelyEqual(operands.get(2), 1)) {
                    super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
                    return;
                }
            }

            super.write(processor, operator, operands);
        }

        boolean isApproximatelyEqual(PdfObject number, float reference) {
            return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
        }

        final String SET_FILL_RGB = "rg";
    };
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(ChangeColor test testChangeFillRgbBlueToBlack)

Beware, this is merely a proof-of-concept, not a final and complete solution. In particular:

  • It merely looks at the fill (non-stroking) colors. In your case that suffices as both your text (as usual) and your underline use fill colors only - the underline actually is not drawn as a stroked line but instead as a slim, filled rectangle.
  • Only RGB blue (and only such blue set using the rg instruction, not set using sc or scn, let alone blues combined out of other colors using funky blend modes) is considered. This might be an issue particularly in case of documents explicitly designed for printing (likely using CMYK colors).
  • PdfCanvasEditor only inspects and edits the content stream of the page itself, not the content streams of displayed form XObjects or patterns; thus, some content may not be found. It can be generalized fairly easily.

The result:

Replacing blue fill and stroke colors

Testing the code above you soon found documents in which the underlines were not changed. As it turned out, these underlines are actually drawn as stroked lines, not as filled rectangle as above.

To also properly edit such documents, therefore, you must not only edit the fill colors but also the stroke colors, e.g. like this:

try (   PdfReader pdfReader = new PdfReader(SOURCE_PDF);
        PdfWriter pdfWriter = new PdfWriter(RESULT_PDF);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfCanvasEditor()
    {
        @Override
        protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            String operatorString = operator.toString();

            if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
                if (isApproximatelyEqual(operands.get(0), 0) &&
                        isApproximatelyEqual(operands.get(1), 0) &&
                        isApproximatelyEqual(operands.get(2), 1)) {
                    super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
                    return;
                }
            }

            if (SET_STROKE_RGB.equals(operatorString) && operands.size() == 4) {
                if (isApproximatelyEqual(operands.get(0), 0) &&
                        isApproximatelyEqual(operands.get(1), 0) &&
                        isApproximatelyEqual(operands.get(2), 1)) {
                    super.write(processor, new PdfLiteral("G"), Arrays.asList(new PdfNumber(0), new PdfLiteral("G")));
                    return;
                }
            }

            super.write(processor, operator, operands);
        }

        boolean isApproximatelyEqual(PdfObject number, float reference) {
            return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
        }

        final String SET_FILL_RGB = "rg";
        final String SET_STROKE_RGB = "RG";
    };
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(ChangeColor tests testChangeRgbBlueToBlackControlOfNitrosamineImpuritiesInSartansRev and testChangeRgbBlueToBlackEdqmReportsIssuesOfNonComplianceWithToothMac)

The results:

and

Replacing different shades of blue from other RGB'ish color spaces

Testing the code above you again found documents in which the blue colors were not changed. As it turned out, these blue colors were not from the DeviceRGB standard RGB but instead from ICCBased colorspaces, profiled RGB color spaces to be more exact. In particular other color setting operators were used than before, sc / scn instead of rg. Furthermore, in one document not a pure blue 0 0 1 but instead a .17255 .3098 .63529 blue was used

If we assume that sc and scn instructions with three numeric arguments set some flavor of RGB colors as here (in general this is an oversimplification, Lab and other color spaces can also come with 4 components, but your documents seem RGB oriented) and are less strict in recognizing the blue color, we can generalize the code above as follows:

class AllRgbBlueToBlackConverter extends PdfCanvasEditor {
    @Override
    protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
    {
        String operatorString = operator.toString();

        if (RGB_SETTER_CANDIDATES.contains(operatorString) && operands.size() == 4) {
            if (isBlue(operands.get(0), operands.get(1), operands.get(2))) {
                PdfNumber number0 = new PdfNumber(0);
                operands.set(0, number0);
                operands.set(1, number0);
                operands.set(2, number0);
            }
        }

        super.write(processor, operator, operands);
    }

    boolean isBlue(PdfObject red, PdfObject green, PdfObject blue) {
        if (red instanceof PdfNumber && green instanceof PdfNumber && blue instanceof PdfNumber) {
            float r = ((PdfNumber)red).floatValue();
            float g = ((PdfNumber)green).floatValue();
            float b = ((PdfNumber)blue).floatValue();
            return b > .5f && r < .9f*b && g < .9f*b;
        }
        return false;
    }

    final Set<String> RGB_SETTER_CANDIDATES = new HashSet<>(Arrays.asList("rg", "RG", "sc", "SC", "scn", "SCN"));
}

(ChangeColor helper class)

Used like this

try (   PdfReader pdfReader = new PdfReader(INPUT);
        PdfWriter pdfWriter = new PdfWriter(OUTPUT);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) ) {
    PdfCanvasEditor editor = new AllRgbBlueToBlackConverter();
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

we get

and



标签: pdf itext itext7