I started building a web crawler in CakePHP 2.2. The pages, the script is crawling is HTML pages, and I need to parse them, to get my values.
Have tried some different solutions, and looked on some open source things aswell, but not sure what the best way is to do this.
- DomDocument::loadHTML() - Looks like this is the solution but not 100% sure.
- Regular Expression - A bit hard to maintain
- Simple HTMLDom - http://electrokami.com/coding/simple-html-dom-baked-cakephp-component (Made for Cake 1.3, and the code it self, yeah I don't like it - and got serious memory leak(s))
To figure out, which method I should use, I need your help.