What's the best / recommended practice to treat a text input that contains HTML tags? The idea is to properly display the input e.g. on a different page.
As expected, people can be very creative with their HTML. For example:
<p>...</p>
<ul><li>...</li></ul>
Another example:
<h1>...</h1>
A more challenging case will be an input without a properly structured HTML e.g. <h1>Hello</p></h1>
. Without proper care, this has the potential to break the whole page layout.
One thing I could think of is to completely strip down the tags. However, there might be a much better way. To the very least, I'd like to be able to place a proper spacing / margin between the paragraphs; instead of just collapsing them together.
As rlb.usa noted in the comments, most people rely on a mature, well-tested HTML renderer to display HTML. WebKit, Qt, and Microsoft's Trident control are popular choices depending on the platform.
If you've got a renderer that has problems with nonstandard code, you could run it through something like tidy first.