I'am tasked with converting tons of .doc files to .pdf. And the only way my supervisor wants me to do this is through MSWord 2010. I know I should be able to automate this with python COM automation. Only problem is I dont know how and where to start. I tried searching for some tutorials but was not able to find any (May be I might have, but I don't know what I'm looking for).
Right now I'm reading through this. Dont know how useful this is going to be.
A simple example using comtypes, converting a single file, input and output filenames given as commandline arguments:
You could also use pywin32, which would be the same except for:
and then:
You should start from investigating so called virtual PDF print drivers. As soon as you will find one you should be able to write batch file that prints your DOC files into PDF files. You probably can do this in Python too (setup printer driver output and issue document/print command in MSWord, later can be done using command line AFAIR).
You can use the
docx2pdf
python package to bulk convert docx to pdf. It can be used as both a CLI and a python library. It requires Microsoft Office to be installed and uses COM on Windows and AppleScript (JXA) on macOS.Disclaimer: I wrote the docx2pdf package. https://github.com/AlJohri/docx2pdf
If you don't mind using PowerShell have a look at this Hey, Scripting Guy! article. The code presented could be adopted to use the
wdFormatPDF
enumeration value ofWdSaveFormat
(see here). This blog article presents a different implementation of the same idea.unoconv(writen in python) and openoffice running as a headless daemon. http://dag.wiee.rs/home-made/unoconv/
works very nicely for doc,docx, ppt,pptx, xls, xlsx. Very useful if you need to convert docs or save/convert to certain formats on a server
I have worked on this problem for half a day, so I think I should share some of my experience on this matter. Steven's answer is right, but it will fail on my computer. There are two key points to fix it here:
(1). The first time when I created the 'Word.Application' object, I should make it (the word app) visible before open any documents. (Actually, even I myself cannot explain why this works. If I do not do this on my computer, the program will crash when I try to open a document in the invisible model, then the 'Word.Application' object will be deleted by OS. )
(2). After doing (1), the program will work well sometimes but may fail often. The crash error
"COMError: (-2147418111, 'Call was rejected by callee.', (None, None, None, 0, None))"
means that the COM Server may not be able to response so quickly. So I add a delay before I tried to open a document.After doing these two steps, the program will work perfectly with no failure anymore. The demo code is as below. If you have encountered the same problems, try to follow these two steps. Hope it helps.