I'm looking for a way to convert few paragraphs and ordered/unordered lists from a MS Word file to HTML.
Now, the problem is that when saving the Word file as a "htm/html" type of file (I'm using Word 2010), I get tons of all kinds of unwanted CSS directives, some are MS-invented and some are valid CSS, that I don't want in my html code. Moreover, and even more problematic, the ordered/unordered lists not even encoded to OL and UL with LI items, rather to a crazy Microsofty encoding.
For example, a paragraph (Styled as "Normal" in Word) is converted to:
<p class=MsoNormal>
<span style='font-size:10.0pt;line-height:115%;mso-bidi-font-style:italic'>
bla bla </span></p>
And I just want it to plainly be:
<p><span>bla bla</span></p>
More horrific, a simple unoredered list ("bulleted list") with one list item with is converted to:
<p class=MsoListParagraph style='text-indent:-18.0pt;mso-list:l0 level1 lfo1'>
<![if !supportLists]>
<span style='font-family:Symbol;mso-fareast-font-family:Symbol;mso-bidi-font-family:Symbol'>
<span style='mso-list:Ignore'>·
<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>
<span dir=LTR</span>Bla bla</p>
While I wish to get:
<ul><li>Bla bla</li></ul>
Any ideas?
Thanks so much!
p.s. I'm using Zend Studio (maybe there's a built in eclipse/zend-specific converter or something?)
p.s.p. The only MS Word options for exporting as html I've found are in Options => Advanced => General => Web Options. Playing with these options didn't solve any of the above problems.