Loading text from .docx to MySQL using Python-docx

2019-09-17 21:13发布

问题:

As of now, I am using Python-docx to convert the text in a .docx file into a single string.

f = open(os.path.expanduser("~/documents/myFile.docx"))

document = opendocx(f)

docString = ''.join(getdocumenttext(document))

I am then parsing the string using simple built-in Python split methods. Once the string is parsed into a list, I am loading that list into a MySQL database. This works great, but my only problem is I want to preserve the special characters.

The database supports these special character (utf-8) but a lot of characters and formatting (italics, bold, etc.) are lost when I convert the .docx into a string.

I want to be able to parse and load text with the formatting intact from the .docx file.