I would like know whether we can highlight text (colors) of already created PDF
using itextsharp
?
I see examples like creating a new PDF, while doing so we can apply colors. I am looking for where I can get chunks of text from PDF and apply colors and save it.
Here is the thing I am trying to accomplish, read a PDF file, parse text and highlight text based on business rules.
Any third party dll suggestion also works, as a first step I am looking in to opensource iTextsharp library
.
Yes you can highlight text but you will have to work for it unfortunately. What looks like a highlight is a PDF Text Markup Annotation as far as the spec is considered. That part is pretty easy. The hard part is figuring out the coordinates to apply the annotation to.
Here's the simple code for creating a highlight using an existing
PdfStamper
calledstamper
:Once you have the highlight you can set the color using:
And then add it to your
stamper
on page 1 using:Technically the
rect
parameter doesn't actually get used (as far as I can tell) and instead gets overridden by thequad
parameter. Thequad
parameter is an array of x,y coords that essentially represent the corners of a rectangle (technically quadrilateral). The spec says they start in the bottom left and go counter-clockwise but in reality they appear to go bottom left to bottom right to top left to top right. Calculating the quad is a pain so instead its just easier to create a rectangle and create the quad from it:So how do you get the rectangle of existing text in the first place? For that you need to look at
TextExtractionStrategy
andPdfTextExtractor
. There's a lot to go into so I'm going to start by pointing you at this post which has some further posts linked.Below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.2 that shows off the creation of a simple PDF and the highlighting of part of the text using hard-coded coordinates. If you need help calculating these coordinates start with the link above and then ask any questions!