Extracting text marked for redaction in a PDF docu

2019-07-31 06:39发布

问题:

I am working on a PDF acrobat add-on product and one of the requirements is to extract the text marked for redaction in a given PDF document.

Assuming you know what is "redaction" ( Please read this if you don't http://acrobatusers.com/tutorials/redacting-pdf-files-survey-tools ), please suggest how can I discover the co-ordinates for the text which has been "marked" for redaction in any PDF and then extract the exact text.

Please ask for more details if you believe you can lead me to the correct answers. I have tried using iTextSharp and Aspose.PDF libraries for the same without much success.

回答1:

When you mark text for redaction with Acrobat, it creates redaction annotations. The redaction annotations have the /Subtype key set to /Redact. The redaction area is defined by the /QuadPoints key in annotation dictionary. I do not know if iTextSharp or Aspose support redaction annotations. With iTextSharp you can use the COS API to retrieve the raw PDF objects and inspect the objects you need.