如何从跨域HTTP请求的具体内容(How to get specific content from

有一个荷兰新闻网站： nu.nl我在得到它居住在她的第一个网址标题很感兴趣：

<h3 class="hdtitle">
          <a style="" onclick="NU.AT.internalLink(this, event);" xtclib="position1_article_1" href="/buitenland/2880252/griekse-hotels-ontruimd-bosbranden.html">
            Griekse hotels ontruimd om bosbranden            <img src="/images/i18n/nl/slideshow/bt_fotograaf.png" class="vidlinkicon" alt="">          </a>
        </h3>

所以我的问题是如何得到这个网址？我可以用jQuery做到这一点？我想不会，因为它不是我的服务器上。所以，也许我将不得不使用PHP？我从哪里开始...？

Answer 1:

测试和工作

因为http://www.nu.nl是不是你的网站，你可以做一个跨域 GET使用PHP代理方法，否则你会得到这样的错误：

XMLHttpRequest的无法加载http://www.nu.nl/ 。产地http://yourdomain.com没有被访问控制允许来源允许的。

首先，在你的PHP在服务器端使用本文件：

proxy.php（更新版）

<?php
if(isset($_GET['site'])){
  $f = fopen($_GET['site'], 'r');
  $html = '';
  while (!feof($f)) {
    $html .= fread($f, 24000);
  }
  fclose($f);
  echo $html;
}
?>

现在，在JavaScript的使用jQuery端，你可以做到以下几点：

（只知道我使用的prop(); 。因为我使用jQuery 1.7.2版本，所以，如果你是1.6.x版之前使用的版本，尝试attr();代替）

$(function(){

   var site = 'http://www.nu.nl';

   $.get('proxy.php', { site:site }, function(data){

      var href = $(data).find('.hdtitle').first().children(':first-child').prop('href');
      var url = href.split('/');
      href = href.replace(url[2], 'nu.nl');

      // Put the 'href' inside your div as a link
      $('#myDiv').html('<a href="' + href + '" target="_blank">' + href + '</a>');

   }, 'html');

});

正如你所看到的，要求是在你的域，但也是一种棘手的事情，所以你将无法获得Access-Control-Allow-Origin再次错误！

更新

如果你想获得所有头条href如您在评论中写道，你可以做到以下几点：

只要改变jQuery代码是这样的...

$(function(){

   var site = 'http://www.nu.nl';

   $.get('proxy.php', { site:site }, function(data){

        // get all html headlines
        headlines = $(data).find('.hdtitle');

        // get 'href' attribute of each headline and put it inside div
        headlines.map(function(elem, index){ 
            href = $(this).children(':first-child').prop('href');
            url = href.split('/');
            href = href.replace(url[2], 'nu.nl');
            $('#myDiv').append('<a href="' + href + '" target="_blank">' + href + '</a><br/>');
        });

   }, 'html');

});

并使用更新proxy.php文件（这两种情况下，1或全部头条）。

希望这可以帮助：-）

Answer 2:

您可以使用simplehtmldom库来获取链接

类似的东西

$html = file_get_html('website_link');
echo $html->getElementById("hdtitle")->childNodes(1)->getAttribute('href');

阅读更多在这里

Answer 3:

我会建议RSS，但不幸的是，你要寻找的标题似乎并没有出现在那里。

<?

$f = fopen('http://www.nu.nl', 'r');
$html = '';
while(strpos($html, 'position1_article_1') === FALSE)
    $html .= fread($f, 24000);
fclose($f);
$pos = strpos($html, 'position1_article_1');
$urlleft = substr($html, $pos + 27);
$url = substr($urlleft, 0, strpos($urlleft, '"'));
echo 'http://www.nu.nl' . $url;

?>

输出： http://www.nu.nl/buitenland/2880252/griekse-hotels-ontruimd-bosbranden.html

Answer 4:

使用curl检索页面。然后，使用下面的函数来分析您所提供的字符串;

preg_match("/<a.*?href\=\"(.*?)\".*?>/is",$text,$matches);

结果网址将在$匹配阵列。

Answer 5:

如果你想建立一个jQuery机器人通过浏览器刮页面（谷歌浏览器的扩展允许此功能）：

// print out the found anchor link's href attribute
console.log($('.hdtitle').find('a').attr('href'));

如果你想使用PHP，你需要刮页面此href链接。使用库如SimpleTest做到这一点。定期刮最好的办法是给你的PHP脚本链接到cronjob为好。

SimpleTest的 ： http://www.lastcraft.com/browser_documentation.php

的cronjob： http://net.tutsplus.com/tutorials/php/managing-cron-jobs-with-php-2/

祝好运！

文章来源: How to get specific content from cross-domain http request