Add MS Office Documents to PDF via Apache PDFBox

2019-02-13 16:50发布

问题:

I'm using Apache PDFBox (http://pdfbox.apache.org/) for creating PDFs out of an arbitrary amount of files, including Images and other PDFs. Now I need to add MS Office Documents (Word, Excel and Outlook MSGs) to the PDF. The files can have nearly every Office Version, so it is not granted that the file is a new office file (e.g. docx) or an old one (e.g. doc).

Is there any way to do this only with free tools? My first idea is to read the contnet of every file with Apache POI (http://poi.apache.org/) and recreate the file as a new PDF page, but this can become very costly, as this PDF creation is used on a server by more than fifty people.

回答1:

Install open office on you server. and it will convert "docx,doc" document to ".pdf".

package naveed.workingfiles;

import java.io.*;
import com.artofsolving.jodconverter.openoffice.connection.*;
import com.artofsolving.jodconverter.openoffice.converter.*;
import com.artofsolving.jodconverter.*;

public class DocToPdf {

    public static void main(String[] args) throws Exception {

        //Creating the instance of OpenOfficeConnection and 
        //passing the port number to SocketOpenOfficeConnection constructor 
        OpenOfficeConnection con = new SocketOpenOfficeConnection(8100);

        //making the connection with openoffice server
        con.connect();

        // making the object of doc file and pdf file
        File inFile = new File("sample.docx");

        //this is the final converted pdf file
        File outFile = new File("sample.pdf");

        //making the instance 
        DocumentConverter converter = new OpenOfficeDocumentConverter(con);

        //passing both files objects
        converter.convert(inFile, outFile);

        con.disconnect();
    }

}