how to clean up microsoft html doc?

2020-06-16 09:12发布

I have quite big document in html format that generated from Microsoft Word. It is soooo messy and full of bloated things (like unknow tag, unknow namespace etc and other bloated things)

is there any way to convert it into plain html sytax ?

4条回答
我欲成王,谁敢阻挡
2楼-- · 2020-06-16 09:29

You're probably looking for HTML Tidy, which has adapters in pretty much every language out there. It has options to clean up Microsoft Word HTML output (and many other features).

查看更多
冷血范
3楼-- · 2020-06-16 09:34

Try HTML Tidy. I hear it works quite well on HTML generated by MS Word (definitely at least up to Word 2000, but probably on more recent versions too).

查看更多
在下西门庆
4楼-- · 2020-06-16 09:36

try Cleanup HTML on-line tool to clean up word HTML

查看更多
Fickle 薄情
5楼-- · 2020-06-16 09:44

This isn't really a programming question, but (at least recent versions of) Word can save to "Web Page, Filtered", which removes Office-specific tags and properties and only leaves the tags necessary for the document to be rendered in a web browser. So, if you have Word, you could try using it to open the HTML document and save it in that format.

查看更多
登录 后发表回答