Reading PDF Annotations with iText

2019-03-27 20:59发布

问题:

I trying to get the contents of a PDF annotation to string so I can store that information in a database for searching purposes.

Does anyone know how to accomplish this using iText/iTextSharp?

回答1:

Yes, but the specifics really depend on what kind[s] of annotations you're talking about.

In general:

PdfDictionary pageDict = myPdfReader.getPageN(firstPageIsOne);

PdfArray annotArray = pageDict.getAsArray(PdfName.ANNOTS);

for (int i = 0; i < annotArray.size(); ++i) {
  PdfDictionary curAnnot = annotArray.getAsDict(i);

  int someType = myCodeToGetAnAnnotsType(curAnnot);
  if (someType == THIS_TYPE) {
    writeThisType(curAnnot);
  } else if (someType == THAT_TYPE) {
    writeThatType(curAnnot);
  }
}

For details, you'll need to examine the PDF Specification, in particular the annotation descriptions: "Chapter 12.5.6 Annotation Types".

If you can tell us what types you care about, I can be of more help.



回答2:

For future reference to anyone that finds this question via Google like I did...

If what you want to do is find sticky note annotations name and contents you can do this (based in part on Mark's answer)

PdfReader reader = new PdfReader(somePDF);
PdfDictionary pageDict = reader.GetPageN(1);

PdfArray annotArray = pageDict.GetAsArray(PdfName.ANNOTS);

for (int i = 0; i < annotArray.Size; ++i)
{
    PdfDictionary curAnnot = annotArray.GetAsDict(i);

    PdfString name = curAnnot.GetAsString(PdfName.T);
    PdfString contents = curAnnot.GetAsString(PdfName.CONTENTS);
    if (!string.IsNullOrWhiteSpace(name?.ToString()))
    { Console.WriteLine(name); }
    if (!string.IsNullOrWhiteSpace(contents?.ToString()))
    { Console.WriteLine(contents); }
}

Additionally, to help identify what things you might be looking for you can open a PDF in a text editor and look for /annot and you'll quickly find your annotation object.



标签: c# pdf itext