C# HTML from WebBrowser to valid XHTML

2019-02-10 18:28发布

问题:

So, we are using a webBrowser control in edit mode, to allow people to enter text, and then take that text and send it out to the server for everyone to see. IE, it's an HTML input box.

The HTML output from that box isn't standard XHTML, given that it's just a webBrowser control, so i needed a method to convert any bad HTML to XHTML. I read up on SGML, and subsequently have used:

private static string Html2Xml(string txtHtmlString)
    {
        var xhtml = new Sgml.SgmlReader();
        var sw = new StringWriter();
        var w = new XmlTextWriter(sw);
        xhtml.DocType = "HTML";
        xhtml.InputStream = new StringReader(txtHtmlString);

        while ((!xhtml.EOF))
        {
            w.WriteNode(xhtml, true);
        }

        w.Close();
        return sw.ToString();
    }

I basically pase HTML string to that method, and it returns 'suposed' proper XHTML. However, it's not passing XHTML checks, and the data it returns is just a basic

<html><head></head><body></body></html> 

Format. Thus, not proper XHTML.

So, how can i format that to actually output proper XHTML? There isn't much on MindShares site for SGML documentation anymore, so not sure where to go from here.

Essentially, we need the HTML from the WebBrowser control, which isn't valid XHTML, to output to XHTML, so that we can attach it to an XMPP.msg.Html element (valid XHTML only). If the system detects that any codes within the HTML is invalid, it marks the XMPP.msg.Html as blank, so i know the above method isn't working.

Thanks!

回答1:

Would reccomend using either something like TinyMCE or HtmlAgilityPack (available as a Nuget package or from codeplex).

TinyMCE allows users to perform a rich text edit with appropriate formatting controls, and will output the resultant Html.

HtmlAgilityPAck on the other hand is a library that will allow you to pass in the HtmlStream generated by your method, and output this as a valid Xhtml stream.

Rough example for working with this in the HtmlAgilityPAck as below:

var sb = new StringBuilder(); 
var stringWriter = new StringWriter(sb);

string input = "<html><body><p>This is some test test<ul><li>item 1<li>item2<</ul></body>";

var test = new HtmlAgilityPack.HtmlDocument();
test.LoadHtml(input);
test.OptionOutputAsXml = true;
test.OptionCheckSyntax = true;
test.OptionFixNestedTags = true;

test.Save(stringWriter);

Console.WriteLine(sb.ToString());