html to .doc converter in Python?

2019-01-15 01:12发布

问题:

I am using pisa, which is an HTML to PDF conversion library for Python.

Does there exist the same thing for a Word document: an HTML to .doc conversion library for Python?

回答1:

You could use win32com from the pywin32 python extensions for windows, to let MS Word convert it for you. A simple example:

import win32com.client

word = win32com.client.Dispatch('Word.Application')

doc = word.Documents.Add('example.html')
doc.SaveAs('example.doc', FileFormat=0)
doc.Close()

word.Quit()


回答2:

Though I am not aware of a direct module that can allow you to convert this, however:

  1. You can convert HTML to plain text first using the html2text module.
  2. After that, you can use this the python-docx module to convert the text to a doc or a docx file.


回答3:

In case anybody else lands here attempting to convert the other way around, the above code works, but you need to modify the FileFormat value.

http://msdn.microsoft.com/en-us/library/ff839952.aspx

Example: Filtered html is 10, instead of 0.