I have a C# application that receives an html file. I want to parse and validate it. On output it will return a list of errors or that my html is valid.
Has anyone any idea how can I do this?
I have a C# application that receives an html file. I want to parse and validate it. On output it will return a list of errors or that my html is valid.
Has anyone any idea how can I do this?
There is an obscure DLL in the framework version 1.0 (!) Microsoft.mshtml.dll and that is the only way in the framework to deal with DOM. If HTML is XHTML and a valid XML, then you can use XML but otherwise this is the only chance.
I'd run a local instance of the W3C Markup Validation service and communicate with it via the API
You can use HTML Tidy. There is a wrapper for .NET called TidyManaged
This is relevant to your question:
Looking for C# HTML parser