I have this ill-formed HTML with overlapping tags:
<p>word1<b>word2</p>
<p>word3</b>word4</p>
The overlapping can be nested, too.
How can I convert it into well-formed HTML with HTML Agility Pack (HAP)?
I'm looking for this output:
<p>word1<b>word2</b></p>
<p><b>word3</b>word4</p>
I tried HtmlNode.ElementsFlags["b"] = HtmlElementFlag.Closed | HtmlElementFlag.CanOverlap
, but it does not work as expected.
It is in fact working as expected, but maybe not working as you expected. Anyway, here is a sample piece of code (a Console application) that demonstrates how you can achieve some HTML fixing using the library.
The library has a
ParseErrors
collection that you can use to determine what errors were detecting during markup parsing.There are really two types of problems here:
1) unclosed elements. This one is fixed by default by the library, but there is an option on the P element that prevents that in this case.
2) unopened elements. This one is more complex, because it depends how you want to fix it, where do you want to have the tag opened? In the following sample, I've used the nearest previous text sibling node to open the element.