I would like to grab data from a table without using regular expressions. I've enjoyed using simplexml for parsing RSS feeds and would like to know if it can be used to grab a table from another page.
Eg. Grab the page with curl or simply file_get_contents(); then use simplexml to grab contents?
If this is XHTML — yes, it's definitely possible. True XHTML is just XML in the end, so it can be parsed with an XML parser.
SimpleXML, however, only accepts strict XML. If you can't get valid XHTML it looks like putting it through the less-strict
DOMDocument
library first will do the trick (source here):You can use the
loadHTML
function from the DOM module, and then import that DOM into SimpleXML viasimplexml_import_dom
:My version - tolerant to errors and problems with the encoding
It may depend on a page. If page is in XHTML (most web pages nowadays) then any XML parser should do, otherwise look for SGML parser. Here's a similar question, you might be interested in: Error Tolerant HTML/XML/SGML parsing in PHP