How to extract the color of a rectangle in a PDF,

2019-02-25 15:22发布

问题:

I'm trying to extract the color of a rectangle in a PDF with iText. The following is all what the PDF page have:

And this is the page content extracted with iText:

q
BT
36 806 Td
0 -18 Td
/F1 12 Tf
(Option 1:)Tj
0 0 Td
0 -94.31 Td
ET
Q
q
Q
q
2 J
0 G
0.5 w
88.3 693.69 139.47 94.31 re
S
0.5 w
227.77 693.69 139.47 94.31 re
S
0.5 w
367.23 693.69 139.47 94.31 re
S
Q
BT
1 0 0 1 90.3 774 Tm
/F1 12 Tf
(A rectangle:)Tj
ET
q 1.13 0 0 1.13 229.77 695.69 cm /Xf1 Do Q
BT
1 0 0 1 369.23 774 Tm
/F1 12 Tf
(The rectangle is scaled)Tj
1 0 0 1 369.23 762 Tm
(to fit inside the cell, you)Tj
1 0 0 1 369.23 750 Tm
(see a padding.)Tj
ET
228 810 m
338 810 l
S

But, there is something I'm not able to extract from that code, I'm talking about the red color, and if I generate the same PDF but with another color instead of red, nothing change in the page content (code showed above).

So, my question is, how can I extract that color using some method or properties from iText library for Java.

I'm using iText 5.5.9, and this is the code example I'm using to generate the PDF sample:

Thanks for any help you can provide!


This is the code I'm using to generate the PDF:

String dest = "C:\\TestCreation.pdf";
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();

document.add(new Paragraph("Option 1:"));
PdfPTable table = new PdfPTable(3);
table.addCell("A rectangle:");
PdfTemplate template = writer.getDirectContent().createTemplate(120, 80);
template.setColorFill(BaseColor.RED);
template.rectangle(0, 0, 120, 80);
template.fill();
writer.releaseTemplate(template);
table.addCell(Image.getInstance(template));
table.addCell("The rectangle is scaled to fit inside the cell, you see a padding.");
document.add(table);

PdfContentByte cb = writer.getDirectContent();
cb.moveTo(228, 810);
cb.lineTo(338, 810);
cb.stroke();
document.close();

And you can see here, the PDF file: PDF example

This is the line code I'm using to get the page content: String pageContent = new String(reader.getPageContent(1));

I've been reviewing all the reader object, and I was able to locate the rectangle, but not its color:

回答1:

Your code shows it, this is how you create the rectangle and add it:

PdfTemplate template = writer.getDirectContent().createTemplate(120, 80);
template.setColorFill(BaseColor.RED);
template.rectangle(0, 0, 120, 80);
template.fill();
writer.releaseTemplate(template);
table.addCell(Image.getInstance(template));

An iText PdfTemplate generates a PDF form XObject. A form XObject in turn is a PDF content stream that is a self-contained description of any sequence of graphics objects (including path objects, text objects, and sampled images) (section 8.10.1 of ISO 32000-1), i.e. a separate stream of drawing instructions the content of which can be referenced from any other content stream.

In the case of your page content stream, this is the line where the form XObject is included:

q 1.13 0 0 1.13 229.77 695.69 cm /Xf1 Do Q

(The transformation matrix is manipulated to stretch by 1.13 and moved a bit, then the XObject Xf1 is drawn, then the transformation matrix is reset.)

The content stream of that XObject Xf1 is this:

1 0 0 rg
0 0 120 80 re
f

I.e. it sets the non-stroking color to RGB red, defines a 120x80 rectangle at the origin, and fills it.


This is the line code I'm using to get the page content:

String pageContent = new String(reader.getPageContent(1));

That line is not adequate for getting all the content details:

  1. It only returns the immediate page content but not the detailed instructions from the form XObjects and patterns used in the immediate content. Quite often one finds PDFs whose immediate page contents only reference one or more form XObjects.

  2. In spite of appearances the page content is of a binary nature, not a textual. As soon as fonts with non-standard encodings are used, PDF string contents are meaningless in your Java String or (depending on your standard encoding) even broken.

Instead one should use the iText parser framework, e.g. like this:

ExtRenderListener extRenderListener = new ExtRenderListener()
{
    @Override
    public void beginTextBlock()                        {   }
    @Override
    public void renderText(TextRenderInfo renderInfo)   {   }
    @Override
    public void endTextBlock()                          {   }
    @Override
    public void renderImage(ImageRenderInfo renderInfo) {   }

    @Override
    public void modifyPath(PathConstructionRenderInfo renderInfo)
    {
        pathInfos.add(renderInfo);
    }

    @Override
    public Path renderPath(PathPaintingRenderInfo renderInfo)
    {
        GraphicsState graphicsState;
        try
        {
            graphicsState = getGraphicsState(renderInfo);
        }
        catch (NoSuchFieldException | SecurityException | IllegalArgumentException | IllegalAccessException e)
        {
            e.printStackTrace();
            return null;
        }

        Matrix ctm = graphicsState.getCtm();

        if ((renderInfo.getOperation() & PathPaintingRenderInfo.FILL) != 0)
        {
            System.out.printf("FILL (%s) ", toString(graphicsState.getFillColor()));
            if ((renderInfo.getOperation() & PathPaintingRenderInfo.STROKE) != 0)
                System.out.print("and ");
        }
        if ((renderInfo.getOperation() & PathPaintingRenderInfo.STROKE) != 0)
        {
            System.out.printf("STROKE (%s) ", toString(graphicsState.getStrokeColor()));
        }

        System.out.print("the path ");

        for (PathConstructionRenderInfo pathConstructionRenderInfo : pathInfos)
        {
            switch (pathConstructionRenderInfo.getOperation())
            {
            case PathConstructionRenderInfo.MOVETO:
                System.out.printf("move to %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                break;
            case PathConstructionRenderInfo.CLOSE:
                System.out.printf("close %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                break;
            case PathConstructionRenderInfo.CURVE_123:
                System.out.printf("curve123 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                break;
            case PathConstructionRenderInfo.CURVE_13:
                System.out.printf("curve13 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                break;
            case PathConstructionRenderInfo.CURVE_23:
                System.out.printf("curve23 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                break;
            case PathConstructionRenderInfo.LINETO:
                System.out.printf("line to %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                break;
            case PathConstructionRenderInfo.RECT:
                System.out.printf("rectangle %s ", transform(ctm, expandRectangleCoordinates(pathConstructionRenderInfo.getSegmentData())));
                break;
            }
        }
        System.out.println();

        pathInfos.clear();
        return null;
    }

    @Override
    public void clipPath(int rule)
    {
    }

    List<Float> transform(Matrix ctm, List<Float> coordinates)
    {
        List<Float> result = new ArrayList<>();
        for (int i = 0; i + 1 < coordinates.size(); i += 2)
        {
            Vector vector = new Vector(coordinates.get(i), coordinates.get(i + 1), 1);
            vector = vector.cross(ctm);
            result.add(vector.get(Vector.I1));
            result.add(vector.get(Vector.I2));
        }
        return result;
    }

    List<Float> expandRectangleCoordinates(List<Float> rectangle)
    {
        if (rectangle.size() < 4)
            return Collections.emptyList();
        return Arrays.asList(
                rectangle.get(0), rectangle.get(1),
                rectangle.get(0) + rectangle.get(2), rectangle.get(1),
                rectangle.get(0) + rectangle.get(2), rectangle.get(1) + rectangle.get(3),
                rectangle.get(0), rectangle.get(1) + rectangle.get(3)
                );
    }

    String toString(BaseColor baseColor)
    {
        if (baseColor == null)
            return "DEFAULT";
        return String.format("%s,%s,%s", baseColor.getRed(), baseColor.getGreen(), baseColor.getBlue());
    }

    GraphicsState getGraphicsState(PathPaintingRenderInfo renderInfo) throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException
    {
        Field gsField = PathPaintingRenderInfo.class.getDeclaredField("gs");
        gsField.setAccessible(true);
        return (GraphicsState) gsField.get(renderInfo);
    }

    final List<PathConstructionRenderInfo> pathInfos = new ArrayList<>();
};

try (   InputStream resource = [RETRIEVE FILE TO PARSE AS INPUT STREAM])
{
    PdfReader pdfReader = new PdfReader(resource);

    for (int page = 1; page <= pdfReader.getNumberOfPages(); page++)
    {
        System.out.printf("\nPage %s\n====\n", page);

        PdfReaderContentParser parser = new PdfReaderContentParser(pdfReader);
        parser.processContent(page, extRenderListener);

    }
}

(ExtractPaths test method testExtractFromTestCreation)

For your sample file this results in the output

Page 1
====
STROKE (0,0,0) the path rectangle [88.3, 693.69, 227.77, 693.69, 227.77, 788.0, 88.3, 788.0] 
STROKE (0,0,0) the path rectangle [227.77, 693.69, 367.24, 693.69, 367.24, 788.0, 227.77, 788.0] 
STROKE (0,0,0) the path rectangle [367.23, 693.69, 506.7, 693.69, 506.7, 788.0, 367.23, 788.0] 
FILL (255,0,0) the path rectangle [229.77, 695.69, 365.37, 695.69, 365.37, 786.09, 229.77, 786.09] 
STROKE (DEFAULT) the path move to [228.0, 810.0] line to [338.0, 810.0] 

iText represents color values as bytes (0-255) instead of as the unit range (0.0 - 1.0) the PDF uses. Thus, you see '(255,0,0)' where the PDF selected '1 0 0 rg'.



回答2:

To find the color of your rectangle, you may need to browse through the /Annots section of the PDF stream. Here, you are only exploring the /Contents, which doesn't include information such as color for the Rect entities.

I hope it will help :)



标签: java pdf itext