Python & MS Word: Convert .doc to .docx?

2020-07-23 04:43发布

问题:

I found several questions that were similar to mine, but none of the answers came close to what I need.

Specifications: I'm working with Python 3 and do not have MS Word. My programming machine is running OS X and cloud machine is linux/ubuntu too.

I'm using python-docx to extract values from a .doc file that is sent to me nightly. However, python-docx only works with .docx files, so I need to convert the file to that extension first.

So, I've got a .doc file that I need to convert to .docx. This script might have to run in the cloud so I can't install any kind of Office or Office-like software. Can this be done?

回答1:

You could use unoconv - Universal Office Converter. Convert between any document format supported by LibreOffice/OpenOffice.

unoconv -d document --format=docx *.doc
subprocess.call(['unoconv', '-d', 'document', '--format=docx', filename])


回答2:

You are working with Linux/ubuntu, you can use LibreOffice’s inbuilt converter.

SYNTAX

lowriter --convert-to docx *.doc

Example

lowriter --convert-to docx testdoc.doc

This will convert all doc files to docx and save in the same folder itself.



回答3:

First you will need to be using Windows. If that is an acceptable barrier then please read on....

Next you need to install the Microsoft Office Compatibility Pack.

Now download and install the Microsoft Office Migration Planning Manager.

To run the tool you need to create a .ini file that controls the program. An example .ini file and further information is available on this blog post. There is more detailed information from Microsoft here.