I would like know whether we can highlight text (colors) of already created PDF
using itextsharp
?
I see examples like creating a new PDF, while doing so we can apply colors. I am looking for where I can get chunks of text from PDF and apply colors and save it.
Here is the thing I am trying to accomplish, read a PDF file, parse text and highlight text based on business rules.
Any third party dll suggestion also works, as a first step I am looking in to opensource iTextsharp library
.
Yes you can highlight text but you will have to work for it unfortunately. What looks like a highlight is a PDF Text Markup Annotation as far as the spec is considered. That part is pretty easy. The hard part is figuring out the coordinates to apply the annotation to.
Here's the simple code for creating a highlight using an existing PdfStamper
called stamper
:
PdfAnnotation highlight = PdfAnnotation.CreateMarkup(stamper.Writer, rect, null, PdfAnnotation.MARKUP_HIGHLIGHT, quad);
Once you have the highlight you can set the color using:
highlight.Color = BaseColor.YELLOW;
And then add it to your stamper
on page 1 using:
stamper.AddAnnotation(highlight,1);
Technically the rect
parameter doesn't actually get used (as far as I can tell) and instead gets overridden by the quad
parameter. The quad
parameter is an array of x,y coords that essentially represent the corners of a rectangle (technically quadrilateral). The spec says they start in the bottom left and go counter-clockwise but in reality they appear to go bottom left to bottom right to top left to top right. Calculating the quad is a pain so instead its just easier to create a rectangle and create the quad from it:
iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(60.6755f, 749.172f, 94.0195f, 735.3f);
float[] quad = { rect.Left, rect.Bottom, rect.Right, rect.Bottom, rect.Left, rect.Top, rect.Right, rect.Top };
So how do you get the rectangle of existing text in the first place? For that you need to look at TextExtractionStrategy
and PdfTextExtractor
. There's a lot to go into so I'm going to start by pointing you at this post which has some further posts linked.
Below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.2 that shows off the creation of a simple PDF and the highlighting of part of the text using hard-coded coordinates. If you need help calculating these coordinates start with the link above and then ask any questions!
using System;
using System.ComponentModel;
using System.Data;
using System.Text;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
//Create a simple test file
string outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Test.pdf");
using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document doc = new Document(PageSize.LETTER))
{
using (PdfWriter w = PdfWriter.GetInstance(doc, fs))
{
doc.Open();
doc.Add(new Paragraph("This is a test"));
doc.Close();
}
}
}
//Create a new file from our test file with highlighting
string highLightFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Highlighted.pdf");
//Bind a reader and stamper to our test PDF
PdfReader reader = new PdfReader(outputFile);
using (FileStream fs = new FileStream(highLightFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (PdfStamper stamper = new PdfStamper(reader, fs))
{
//Create a rectangle for the highlight. NOTE: Technically this isn't used but it helps with the quadpoint calculation
iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(60.6755f, 749.172f, 94.0195f, 735.3f);
//Create an array of quad points based on that rectangle. NOTE: The order below doesn't appear to match the actual spec but is what Acrobat produces
float[] quad = { rect.Left, rect.Bottom, rect.Right, rect.Bottom, rect.Left, rect.Top, rect.Right, rect.Top };
//Create our hightlight
PdfAnnotation highlight = PdfAnnotation.CreateMarkup(stamper.Writer, rect, null, PdfAnnotation.MARKUP_HIGHLIGHT, quad);
//Set the color
highlight.Color = BaseColor.YELLOW;
//Add the annotation
stamper.AddAnnotation(highlight,1);
}
}
this.Close();
}
}
}