There is a Dutch news website at: nu.nl I am very interested in getting the first url headline which is resided over her:
<h3 class="hdtitle">
<a style="" onclick="NU.AT.internalLink(this, event);" xtclib="position1_article_1" href="/buitenland/2880252/griekse-hotels-ontruimd-bosbranden.html">
Griekse hotels ontruimd om bosbranden <img src="/images/i18n/nl/slideshow/bt_fotograaf.png" class="vidlinkicon" alt=""> </a>
</h3>
So my question is how do I get this url? Can I do this with Jquery? I would think not because it is not on my server. So maybe I would have to use PHP? Where do I start...?
If you want to set up a jQuery bot to scrape the page through a browser (Google Chrome extensions allow for this functionality):
If you want to use PHP, you'll need to scrape the page for this
href
link. Use libraries such asSimpleTest
to accomplish this. The best way to periodically scrape is to link your PHP script to acronjob
as well.SimpleTest: http://www.lastcraft.com/browser_documentation.php
cronjob: http://net.tutsplus.com/tutorials/php/managing-cron-jobs-with-php-2/
Good luck!
Tested and working
Because http://www.nu.nl is not your site, you can do a cross-domain
GET
using thePHP
proxy method, otherwise you will get this kind of error:First of all use this file in your server at PHP side:
proxy.php (Updated)
Now, at javascript side using jQuery you can do the following:
(Just to know I am using
prop();
cause I use jQuery 1.7.2 version. So, if you are using a version before 1.6.x, tryattr();
instead)As you can see, the request is in your domain but is a kind of tricky thing so you won't get the
Access-Control-Allow-Origin
error again!Update
If you want to get all headlines
href
as you wrote in comments, you can do the following:Just change jQuery code like this...
and use updated
proxy.php
file (for both cases, 1 or all headlines).Hope this helps :-)
You can use simplehtmldom library to get that link
Something like that
read more here
I would have suggested RSS, but unfortunately the headline you're looking for doesn't seem to appear there.
Outputs: http://www.nu.nl/buitenland/2880252/griekse-hotels-ontruimd-bosbranden.html
Use cURL to retrieve the page. Then, use the following function to parse the string you've provided;
The result URL will be in the $matches array.