An efficient way to convert document to pdf format

I have been trying to find the efficient way to convert document e.g. doc, docx, ppt, pptx to pdf. So far i have tried docsplit and oowriter, but both took > 10 seconds to complete the job on pptx file having size 1.7MB. Can any one suggest me a better way or suggestions to improve my approach?

What i have tried:

from subprocess import Popen, PIPE
import time

def convert(src, dst):
    d = {'src': src, 'dst': dst}
    commands = [
        '/usr/bin/docsplit pdf --output %(dst)s %(src)s' % d,
        'oowriter --headless -convert-to pdf:writer_pdf_Export %(dst)s %(src)s' % d,
    ]

    for i in range(len(commands)):
        command = commands[i]
        st = time.time()
        process = Popen(command, stdout=PIPE, stderr=PIPE, shell=True) # I am aware of consequences of using `shell=True` 
        out, err = process.communicate()
        errcode = process.returncode
        if errcode != 0:
            raise Exception(err)
        en = time.time() - st
        print 'Command %s: Completed in %s seconds' % (str(i+1), str(round(en, 2)))

if __name__ == '__main__':
    src = '/path/to/source/file/'
    dst = '/path/to/destination/folder/'
    convert(src, dst)

Output:

Command 1: Completed in 11.91 seconds
Command 2: Completed in 11.55 seconds

Environment:

Linux - Ubuntu 12.04
Python 2.7.3

More tools result:

jodconverter took 11.32 seconds

标签： python pdf ubuntu document-conversion docsplit

4条回答

倾城　Initia

2楼-- · 2019-03-09 19:49

Unfortunately I don't have the time to do a full benchmark, but you may want to check out xtopdf, my Python toolkit for PDF creation. It doesn't do the full range of conversions you want, and some of the conversions have limitations, but it may be of use. xtopdf links:

Online presentation about xtopdf - a good summary of what it is, what it does, platforms, features, users, uses etc.: http://slid.es/vasudevram/xtopdf

xtopdf on Bitbucket: https://bitbucket.org/vasudevram/xtopdf

Many blog posts showing how to use xtopdf for various purpose, including many that show how to use it to convert different input formats to PDF: http://jugad2.blogspot.com/search/label/xtopdf

HTH, Vasudev Ram

0人赞添加讨论(0) 举报

闹够了就滚

3楼-- · 2019-03-09 20:03

Try calling unoconv from your Python code, it took 8 seconds on my local machine, I don't know if it's fast enough for you:

time unoconv 15.\ Text-Files.pptx
real    0m8.604s

0人赞添加讨论(0) 举报

手持菜刀，她持情操

4楼-- · 2019-03-09 20:03

For doc and docx (but not ppt/pptx), you could try our independent (but commercial) high fidelity rendering engine online at OnlineDemo/docx_to_pdf

By "high fidelity", I mean it is designed from the ground up to have the same line and paragraph breaks, tab stops etc etc as Microsoft Word.

0人赞添加讨论(0) 举报

Anthone

5楼-- · 2019-03-09 20:07

Pandoc is a wonderful tool capable of doing what you'd like quickly. Since you're using Popen to effectively shell out the command for the tool, it doesn't matter what language the tool is written in (Pandoc is written in Haskell).

0人赞添加讨论(0) 举报

An efficient way to convert document to pdf format

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间