How to automate PDF form-filling in Java

2020-05-14 17:39发布

问题:

I am doing some "pro bono" development for a food pantry near where I live. They are inundated with forms and paperwork, and I would like to develop a system that simply reads data from their MySQL server (which I set up for them on a previous project) and feeds data into PDF versions of all the forms they are required to fill out. This will help them out enormously and save them a lot of time, as well as get rid of a lot of human errors that are made when filling out these forms.

Not knowing anything about the internals of PDF files, I can foresee two avenues here:

  • Harder Way: It is possible to scan a paper document, turn it into a PDF, and then have software that "fills out" the PDF simply by saying "add text except blah to the following (x,y) coordinates..."; or
  • Easier Way: PDF specification already allows for the construct of "fields" that can be filled out; this way I just write code that says "add text excerpt blah to the field called *address_value*...", etc.

So my first question is: which of the two avenues am I facing? Does PDF have a concept of "fields" or do I need to "fill out" these documents by telling the PDF library the pixel coordinates of where to place data?

Second, I obviously need an open source (and Java) library to do this. iText seems to be a good start but I've heard it can be difficult to work with. Can anyone lend some ideas or general recommendations here? Thanks in advance!

回答1:

You can easily merge data into PDF's fields using the FDF(Form Data Format) technology.

Adobe provides a library to do that : Acrobat Forms Data Format (FDF) Toolkit

Also Apache PDFBox can be used to do that.



回答2:

Please take a look at the chapter about interactive forms in the free ebook The Best iText Questions on StackOverflow. It bundles the answers to questions such as:

  • How to fill out a pdf file programatically?
  • How can I flatten a XFA PDF Form using iTextSharp?
  • Checking off pdf checkbox with itextsharp
  • How to continue field output on a second page?
  • finding out required fields to fill in pdf file
  • and so on...

Or you can watch this video where I explain how to use forms for reporting step by step.

See for instance:

public void manipulatePdf(String src, String dest) throws DocumentException, IOException {
    PdfReader reader = new PdfReader(src);
    PdfStamper stamper = new PdfStamper(reader,
            new FileOutputStream(dest));
    AcroFields fields = stamper.getAcroFields();
    fields.setField("name", "CALIFORNIA");
    fields.setField("abbr", "CA");
    fields.setField("capital", "Sacramento");
    fields.setField("city", "Los Angeles");
    fields.setField("population", "36,961,664");
    fields.setField("surface", "163,707");
    fields.setField("timezone1", "PT (UTC-8)");
    fields.setField("timezone2", "-");
    fields.setField("dst", "YES");
    stamper.setFormFlattening(true);
    stamper.close();
    reader.close();
}


回答3:

public void fillPDF()
{

     try {
            PDDocument pDDocument = PDDocument.load(new File("D:/pdf/pdfform.pdf")); // pdfform.pdf is input file
            PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();


         PDField field = pDAcroForm.getField("Given Name Text Box"); 

          field.setValue("Kalyan"); 
          field = pDAcroForm.getField("Family Name Text Box");
          field.setValue("Gutta");
          field = pDAcroForm.getField("Country Combo Box");
          field.setValue("India");
          System.out.println("country combo" );
          field = pDAcroForm.getField(" Driving License Check Box");

          field = pDAcroForm.getField("Favourite Colour List Box");
        System.out.println("country combo"+ field.isRequired());
          pDDocument.save("D:/pdf/pdf-java-output.pdf");
             pDDocument.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
}