How can I convert a Word document to PDF?

2020-01-25 13:07发布

How can I convert a Word document to PDF where the document contains various things, such as tables. When trying to use iText, the original document looks different to the converted PDF. Is there an open source API / library, rather than calling out to an executable, that I can use?

标签: java pdf ms-word
11条回答
狗以群分
2楼-- · 2020-01-25 13:41

Using JACOB call Office Word is a 100% perfect solution. But it only supports on Windows platform because need Office Word installed.

  1. Download JACOB archive (the latest version is 1.19);
  2. Add jacob.jar to your project classpath;
  3. Add jacob-1.19-x32.dll or jacob-1.19-x64.dll (depends on your jdk version) to ...\Java\jdk1.x.x_xxx\jre\bin
  4. Using JACOB API call Office Word to convert doc/docx to pdf.

    public void convertDocx2pdf(String docxFilePath) {
    File docxFile = new File(docxFilePath);
    String pdfFile = docxFilePath.substring(0, docxFilePath.lastIndexOf(".docx")) + ".pdf";
    
    if (docxFile.exists()) {
        if (!docxFile.isDirectory()) { 
            ActiveXComponent app = null;
    
            long start = System.currentTimeMillis();
            try {
                ComThread.InitMTA(true); 
                app = new ActiveXComponent("Word.Application");
                Dispatch documents = app.getProperty("Documents").toDispatch();
                Dispatch document = Dispatch.call(documents, "Open", docxFilePath, false, true).toDispatch();
                File target = new File(pdfFile);
                if (target.exists()) {
                    target.delete();
                }
                Dispatch.call(document, "SaveAs", pdfFile, 17);
                Dispatch.call(document, "Close", false);
                long end = System.currentTimeMillis();
                logger.info("============Convert Finished:" + (end - start) + "ms");
            } catch (Exception e) {
                logger.error(e.getLocalizedMessage(), e);
                throw new RuntimeException("pdf convert failed.");
            } finally {
                if (app != null) {
                    app.invoke("Quit", new Variant[] {});
                }
                ComThread.Release();
            }
        }
    }
    

    }

查看更多
甜甜的少女心
3楼-- · 2020-01-25 13:43

This is quite a hard task, ever harder if you want perfect results (impossible without using Word) as such the number of APIs that just do it all for you in pure Java and are open source is zero I believe (Update: I am wrong, see below).

Your basic options are as follows:

  1. Using JNI/a C# web service/etc script MS Office (only option for 100% perfect results)
  2. Using the available APIs script Open Office (90+% perfect)
  3. Use Apache POI & iText (very large job, will never be perfect).

Update - 2016-02-11 Here is a cut down copy of my blog post on this subject which outlines existing products that support Word-to-PDF in Java.

Converting Microsoft Office (Word, Excel) documents to PDFs in Java

Three products that I know of can render Office documents:

yeokm1/docs-to-pdf-converter Irregularly maintained, Pure Java, Open Source Ties together a number of libraries to perform the conversion.

xdocreport Actively developed, Pure Java, Open Source It's Java API to merge XML document created with MS Office (docx) or OpenOffice (odt), LibreOffice (odt) with a Java model to generate report and convert it if you need to another format (PDF, XHTML...).

Snowbound Imaging SDK Closed Source, Pure Java Snowbound appears to be a 100% Java solution and costs over $2,500. It contains samples describing how to convert documents in the evaluation download.

OpenOffice API Open Source, Not Pure Java - Requires Open Office installed OpenOffice is a native Office suite which supports a Java API. This supports reading Office documents and writing PDF documents. The SDK contains an example in document conversion (examples/java/DocumentHandling/DocumentConverter.java). To write PDFs you need to pass the "writer_pdf_Export" writer rather than the "MS Word 97" one. Or you can use the wrapper API JODConverter.

JDocToPdf - Dead as of 2016-02-11 Uses Apache POI to read the Word document and iText to write the PDF. Completely free, 100% Java but has some limitations.

查看更多
Deceive 欺骗
4楼-- · 2020-01-25 13:43

You can use Cloudmersive native Java library. It is free for up to 50,000 conversions/month and is much higher fidelity in my experience than other things like iText or Apache POI-based methods. The documents actually look the same as they do in Microsoft Word which for me is the key. Incidentally it can also do XLSX, PPTX, and the legacy DOC, XLS and PPT conversion to PDF.

Here is what the code looks like, first add your imports:

import com.cloudmersive.client.invoker.ApiClient;
import com.cloudmersive.client.invoker.ApiException;
import com.cloudmersive.client.invoker.Configuration;
import com.cloudmersive.client.invoker.auth.*;
import com.cloudmersive.client.ConvertDocumentApi;

Then convert a file:

ApiClient defaultClient = Configuration.getDefaultApiClient();

// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");

ConvertDocumentApi apiInstance = new ConvertDocumentApi();
File inputFile = new File("/path/to/input.docx"); // File to perform the operation on.
try {
  byte[] result = apiInstance.convertDocumentDocxToPdf(inputFile);
  System.out.println(result);
} catch (ApiException e) {
  System.err.println("Exception when calling ConvertDocumentApi#convertDocumentDocxToPdf");
e.printStackTrace();
}

You can get an document conversion API key for free from the portal.

查看更多
孤傲高冷的网名
5楼-- · 2020-01-25 13:43

unoconv, it's a python tool worked in UNIX. While I use Java to invoke the shell in UNIX, it works perfect for me. My source code : UnoconvTool.java. Both JODConverter and unoconv are said to use open office/libre office.

docx4j/docxreport, POI, PDFBox are good but they are missing some formats in conversion.

查看更多
成全新的幸福
6楼-- · 2020-01-25 13:44

I think JOD Converter is easiest way to implement, Please refer below link for more information.

http://mytechbites.blogspot.in/2014/10/convert-documents-to-pdf-in-java.html

查看更多
成全新的幸福
7楼-- · 2020-01-25 13:47

Check out docs-to-pdf-converter on github. Its a lightweight solution designed specifically for converting documents to pdf.

Why?

I wanted a simple program that can convert Microsoft Office documents to PDF but without dependencies like LibreOffice or expensive proprietary solutions. Seeing as how code and libraries to convert each individual format is scattered around the web, I decided to combine all those solutions into one single program. Along the way, I decided to add ODT support as well since I encountered the code too.

查看更多
登录 后发表回答