extracting text from MS word files in python-第3页回答

extracting text from MS word files in python

2019-01-01 05:52发布

for working with MS word files in python, there is python win32 extensions, which can be used in windows. How do I do the same in linux? Is there any library?

标签： python linux ms-word

14条回答

长期被迫恋爱

2楼-- · 2019-01-01 06:16

benjamin's answer is a pretty good one. I have just consolidated...

import zipfile, re

docx = zipfile.ZipFile('/path/to/file/mydocument.docx')
content = docx.read('word/document.xml').decode('utf-8')
cleaned = re.sub('<(.|\n)*?>','',content)
print(cleaned)

0人赞添加讨论(0) 举报

爱死公子算了

3楼-- · 2019-01-01 06:17

OpenOffice.org can be scripted with Python: see here.

Since OOo can load most MS Word files flawlessly, I'd say that's your best bet.

0人赞添加讨论(0) 举报

上一页 1 2 3

extracting text from MS word files in python

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间