Extract paths and shapes with iTextSharp

2019-02-19 11:54发布

问题:

iTextSharp supports creation of shapes and paths with PdfContentByte class, there you can set colors and paint curves and basic elements ... is there a mechanism which does the other way? I am able to get content by calling PdfReader.GetPageContent(...) but I didn't find a "parser" to read those operations, apply them to graphics context and for example paint it on a panel.

Example:

1 1 1 RG
1 1 1 rg
0.12 0 0 0.12 16 31 cm

q
480 421 m
4318 421 l
4318 5459 l
480 5459 l
480 421 l W n
0.074509806931 0.074509806931 0.074509806931 RG
0.074509806931 0.074509806931 0.074509806931 rg /OC /oc1 BDC
....

Thanks for reply!

回答1:

Here is the starting point of extracting the different commands of a page:

    var file = "test.pdf";
    var reader = new PdfReader(file);

    var streamBytes = reader.GetPageContent(2);
    var tokenizer = new PRTokeniser(new RandomAccessFileOrArray(streamBytes));
    var ps = new PdfContentParser(tokenizer);

    List<PdfObject> operands = new List<PdfObject>();
    while (ps.Parse(operands).Count > 0)
    {
        PdfLiteral oper = (PdfLiteral)operands[operands.Count - 1];
        var cmd = oper.ToString();

        switch (cmd)
        {
            case "q":
                Console.WriteLine("SaveGraphicsState(); //q");
                break;

            case "Q":
                Console.WriteLine("RestoreGraphicsState(); //Q");
                break;

           // good luck with the rest!

        }
    }


回答2:

That's not supported in iTextSharp. The reason: parsing for text returns TextRenderInfo objects, parsing for images returns ImageRenderInfo objects, but in which form should we return GraphicsRenderInfo? It's hard to find something generic, and painting to a graphics context is too specific.

The idea is that you write your own parser, as I did for instance for removing OCG layers: OCGParser. This part of iText hasn't been ported to iTextSharp yet, but maybe you can use it for inspiration.

Note that you're actually building PDF to image functionality. Aren't there other products who already support this out of the box?