How do you programmatically redact PDF FIles?

2020-03-03 05:56发布

问题:

Adobe Acrobat has the ability to redact PDF files (that is, actually remove the information, rather than simply drawing a black box on top of it). I would like to use this feature programmatically. To redact using the GUI you select the Mark for Redaction Tool, draw it over the text to be redacted, then Apply Redactions.

Is there any way to do this programmatically, either through AppleScript or some other way?

I know the (X,y) location of the text to be redacted.

Thanks!

回答1:

You can use GroupDocs.Redaction for .NET to programmatically redact text in the PDF documents. You can perform the exact phrase, case-sensitive and regular expression redaction of the text. This is how you can perform the exact phrase redaction.

using (Document doc = Redactor.Load("D:\\candy.pdf"))
{
     doc.RedactWith(new ExactPhraseRedaction("candy", new ReplacementOptions("[redacted]")));
     // Save the document to "*_Redacted.*" file.
     doc.Save(new SaveOptions() { AddSuffix = true, RasterizeToPDF = false }); 
} 

Disclosure: I work as Developer Evangelist at GroupDocs.



回答2:

In order to properly redact a PDF, you need to Alter The Content Stream. This is Very Hard.

If you can find the portion of the content stream that draws the text you want removed, you're halfway there.

The other half is figuring out how to change the content stream such that you don't modify the rest of the document. If the next text draw operator is proceeded by a "tm" command (set the text matrix, which absolutely positions the next piece of text), it's easy. If not... you have to calculate the exact width of the text you're replacing (several different PDF libraries can do this), and alter the drawing commands to skip over that much stuff.

For Example:

BT
/F1 10 Tf
1 0 0 1 30 720 Tm
(Here's some text, and you only want to REDACT that upper case "redact" over there)Tj
*
(This text is positioned relative to the previous line)Tj
1 0 0 1 30 650 Tm
(This text is positioned absolutely, starting at 30, 650)Tj

So you'd have to break up that first (...)Tj line into (Here's some text, and you only want to)Tj, N 0 Td, and (that upper case "redact" over there)Tj... where the 'N' properly adjusts the position of the following text drawing operation such that it lands in EXACTLY THE SAME SPOT. So you'd need to know the precise width of " REDACT " using the font resource /F1 (whatever that turned out to be), sized to 10 points.

Just to make your life more exciting, you have to worry about kerned text too. You can provide little spacing adjustments inline with text thusly:

(This is taken from the first text drawn in the PDF Spec)

[(Adobe Sys)5(t)1(ems Inc)5(orporated)5( 20)5(08 \226 All rights)5( reser)-9(ved)]TJ

To properly redact "Incorporated", you need to determine that it's been split across two strings, and adjust the positioning of the string following it so it's in Exactly The Same Spot.

And strings can be <DEADBEEF> hex values rather than (plain old ascii).

Get the idea? And I haven't covered all the possibilities here, just the most common ones.

Like I said: This is Very Hard.


There's an acrobat plugin called Appligent Redax (no connection) that lets you draw annotations (or generate them via templates, regex, etc) and then run their code to handle the redaction. It should be possible to programmatically create their annotations and perhaps even activate their plugin: JS in a document can run a menu item.



回答3:

Here's a web page that goes through what you need to do. As others mentioned you have to do this in Javascript as that's what Acrobat's native scripting is.

http://acrobatusers.com/tutorials/2008/07/auto_redaction_with_javascript

While I use Acrobat regularly I've surprisingly never had a need to script it. I checked the dictionary for it and it looks like you'll have to write Javascript file, save it and then open it with Applescript if that's what you want to do (say as a service).

tell application "Adobe Acrobat Professional"
   do script "this.info.title;"
end tell

Here's Adobe's Javascript for Acrobat documentation

http://livedocs.adobe.com/acrobat_sdk/9.1/Acrobat9_1_HTMLHelp/wwhelp/wwhimpl/common/html/wwhelp.htm?context=Acrobat9_HTMLHelp&file=JavaScript_SectionPage.70.1.html



回答4:

Within Adobe Acrobat you may be able to do this through the use of an ActionScript that can be invoked on a number of different events.

If you would like to do this in a seperate application there are a number of different tools in a variety of platforms that can create and manipulate PDF documents, although I have yet to find a feature rich open source library that can even come close to some of these offerings.

http://www.aspose.com/categories/.net-components/aspose.pdf-for-.net/default.aspx

http://www.aspose.com/categories/java-components/aspose.pdf-for-java/default.aspx

http://itextpdf.com/

iText is my personal favorite and worth every penny.