Can anyone recommend a C or Objective-C library for HTML parsing? It needs to handle messy HTML code that won't quite validate.
Does such a library exist, or am I better off just trying to use regular expressions?
Can anyone recommend a C or Objective-C library for HTML parsing? It needs to handle messy HTML code that won't quite validate.
Does such a library exist, or am I better off just trying to use regular expressions?
I wrote a lightweight wrapper around libxml which maybe useful:
Objective-C-HMTL-Parser
Just in case anyone has got here by googling for a nice XPath parser and gone off and used TFHpple, Note that TFHpple uses XPathQuery. This is pretty good, but has a memory leak.
In the function *PerformXPathQuery, if the nodes are found to be nil, it jumps out before cleaning up.
So where you see this bit of code: Add in the two cleanup lines.
If you are doing a LOT of parsing, it's a vicious leak. Now.... how do I get my night back :-)
You may want to check out ElementParser. It provides "just enough" parsing of HTML and XML. Nice interfaces make walking around XML / HTML documents very straightforward. http://touchtank.wordpress.com/