Adobe Acrobat has the ability to redact PDF files (that is, actually remove the information, rather than simply drawing a black box on top of it). I would like to use this feature programmatically. To redact using the GUI you select the Mark for Redaction Tool, draw it over the text to be redacted, then Apply Redactions.
Is there any way to do this programmatically, either through AppleScript or some other way?
I know the (X,y) location of the text to be redacted.
Thanks!
Here's a web page that goes through what you need to do. As others mentioned you have to do this in Javascript as that's what Acrobat's native scripting is.
http://acrobatusers.com/tutorials/2008/07/auto_redaction_with_javascript
While I use Acrobat regularly I've surprisingly never had a need to script it. I checked the dictionary for it and it looks like you'll have to write Javascript file, save it and then open it with Applescript if that's what you want to do (say as a service).
Here's Adobe's Javascript for Acrobat documentation
http://livedocs.adobe.com/acrobat_sdk/9.1/Acrobat9_1_HTMLHelp/wwhelp/wwhimpl/common/html/wwhelp.htm?context=Acrobat9_HTMLHelp&file=JavaScript_SectionPage.70.1.html
You can use GroupDocs.Redaction for .NET to programmatically redact text in the PDF documents. You can perform the exact phrase, case-sensitive and regular expression redaction of the text. This is how you can perform the exact phrase redaction.
Disclosure: I work as Developer Evangelist at GroupDocs.
Within Adobe Acrobat you may be able to do this through the use of an ActionScript that can be invoked on a number of different events.
If you would like to do this in a seperate application there are a number of different tools in a variety of platforms that can create and manipulate PDF documents, although I have yet to find a feature rich open source library that can even come close to some of these offerings.
http://www.aspose.com/categories/.net-components/aspose.pdf-for-.net/default.aspx
http://www.aspose.com/categories/java-components/aspose.pdf-for-java/default.aspx
http://itextpdf.com/
iText is my personal favorite and worth every penny.
In order to properly redact a PDF, you need to Alter The Content Stream. This is Very Hard.
If you can find the portion of the content stream that draws the text you want removed, you're halfway there.
The other half is figuring out how to change the content stream such that you don't modify the rest of the document. If the next text draw operator is proceeded by a "tm" command (set the text matrix, which absolutely positions the next piece of text), it's easy. If not... you have to calculate the exact width of the text you're replacing (several different PDF libraries can do this), and alter the drawing commands to skip over that much stuff.
For Example:
So you'd have to break up that first
(...)Tj
line into(Here's some text, and you only want to)Tj
,N 0 Td
, and(that upper case "redact" over there)Tj
... where the 'N' properly adjusts the position of the following text drawing operation such that it lands in EXACTLY THE SAME SPOT. So you'd need to know the precise width of " REDACT " using the font resource /F1 (whatever that turned out to be), sized to 10 points.Just to make your life more exciting, you have to worry about kerned text too. You can provide little spacing adjustments inline with text thusly:
(This is taken from the first text drawn in the PDF Spec)
To properly redact "Incorporated", you need to determine that it's been split across two strings, and adjust the positioning of the string following it so it's in Exactly The Same Spot.
And strings can be
<DEADBEEF>
hex values rather than(plain old ascii)
.Get the idea? And I haven't covered all the possibilities here, just the most common ones.
Like I said: This is Very Hard.
There's an acrobat plugin called Appligent Redax (no connection) that lets you draw annotations (or generate them via templates, regex, etc) and then run their code to handle the redaction. It should be possible to programmatically create their annotations and perhaps even activate their plugin: JS in a document can run a menu item.