How do I "flatten" a PDF-form (remove the form-field but keep the text of the field) with PDFBox?
Same question was answered here:
a quick way to do this, is to remove the fields from the acrofrom.
For this you just need to get the document catalog, then the acroform and then remove all fields from this acroform.
The graphical representation is linked with the annotation and stay in the document.
So I wrote this code:
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;
public class PdfBoxTest {
public void test() throws Exception {
PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
if (acroForm == null) {
System.out.println("No form-field --> stop");
return;
}
@SuppressWarnings("unchecked")
List<PDField> fields = acroForm.getFields();
// set the text in the form-field <-- does work
for (PDField field : fields) {
if (field.getFullyQualifiedName().equals("formfield1")) {
field.setValue("Test-String");
}
}
// remove form-field but keep text ???
// acroForm.getFields().clear(); <-- does not work
// acroForm.setFields(null); <-- does not work
// acroForm.setFields(new ArrayList()); <-- does not work
// ???
pdDoc.save("E:\\Form-Test-Result.pdf");
pdDoc.close();
}
}
setReadOnly did work for me as shown below -
After reading about pdf reference guide, I have discovered that you can quite easily set read-only mode for AcroForm fields by adding "Ff" key (Field flags) with value 1. This is what documentation stands about that:
so the code could look like that (using pdfbox lib):
Solution to flattening acroform AND retaining the form field values using pdfBox:
The solution that worked for me with pdfbox 2.0.1:
I didn't need to do the 2 extra steps in the above solution link:
I created my pdf form in OpenOffice 4.1.1 and exported to pdf. The 2 items selected in the OpenOffice export dialogue were:
Using PdfBox I populated the form fields and created a flattened pdf file that removed the form fields but retained the form field values.
This works for sure - I've ran into this problem, debugged all-night, but finally figured out how to do this :)
This is assuming that you have capability to edit the PDF in some way/have some control over the PDF.
First, edit the forms using Acrobat Pro. Make them hidden and read-only.
Then you need to use two libraries: PDFBox and PDFClown.
PDFBox removes the thing that tells Adobe Reader that it's a form; PDFClown removes the actual field. PDFClown must be done first, then PDFBox (in that order. The other way around doesn't work).
Single field example code:
Probably some typos here and there, but this should be enough to get the gist :)
I thought I'd share our approach that worked with PDFBox 2+.
We've used the
PDAcroForm.flatten()
method.The fields needed some preprocessing and most importantly the nested field structure had to be traversed and DV and V checked for values.
Finally what worked was the following:
With PDFBox 2 it's now possible to "flatten" a PDF-form easily with this new API method: PDAcroForm.flatten().
Simplified code with an example call of this method:
Note: dynamic XFA forms cannot be flatten.
For migration from PDFBox 1.* to 2.0, take a look at the official migration guide.