Knowing that I can't use HTMLAgilityPack, only straight .NET, say I have a string that contains some HTML that I need to parse and edit in such ways:
- find specific controls in the hierarchy by id or by tag
- modify (and ideally create) attributes of those found elements
Are there methods available in .net to do so?
HtmlDocument
GetElementById
HtmlElement
You can create a dummy html document.
Output:
2
file:///c:
about:myUrl
Editing elements:
Output:
file:///d:
You can look at how HTML Agility Pack works, however, it is .Net. You can reflect the assembly and see that it is using the MFC and could be reproduced if you so wanted, but you'd be doing nothing more than moving the assembly, not making it any more .Net.
Assuming you're dealing with well formed HTML, you could simply treat the text as an XML document. The framework is loaded with features to do exactly what you're asking.
http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx
Aside from the HTML Agility Pack, and porting HtmlUnit over to C#, what sounds like solid solutions are:
One thing I do know is that parsing HTML like XML may cause you to run into a few problems. XML and HTML are not the same. Read about it: here
Also, here is a post about Linq vs Regex.