php crawler detection

2019-05-27 02:29发布

I'm trying to write a sitemap.php which acts differently depending on who is looking.

I want to redirect crawlers to my sitemap.xml, as that will be the most updated page and will contain all the info they need, but I want my regular readers to be show a html sitemap on the php page.

This will all be controlled from within the php header, and I've found this code on the web which by the looks of it should work, but it's not. Can anyone help crack this for me?

function getIsCrawler($userAgent) {
    $crawlers = 'firefox|Google|msnbot|Rambler|Yahoo|AbachoBOT|accoona|' .
    'AcioRobot|ASPSeek|CocoCrawler|Dumbot|FAST-WebCrawler|' .
    'GeonaBot|Gigabot|Lycos|MSRBOT|Scooter|AltaVista|IDBot|eStyle|Scrubby';
    $isCrawler = (preg_match("/$crawlers/i", $userAgent) > 0);
    return $isCrawler;
}

$iscrawler = getIsCrawler($_SERVER['HTTP_USER_AGENT']);

if ($isCrawler) {
    header('Location: http://www.website.com/sitemap.xml');
    exit;
} else {
    echo "not crawler!";
}

It looks pretty simple, but as you can see i've added firefox into the agent list, and sure enough I'm not being redirected..

Thanks for any help :)

2条回答
啃猪蹄的小仙女
2楼-- · 2019-05-27 02:39

You have a mistake in your code:

$crawler = getIsCrawler($_SERVER['HTTP_USER_AGENT']);

should be

$isCrawler = getIsCrawler($_SERVER['HTTP_USER_AGENT']);

If you develop with notices on you'll catch these errors much more easily.

Also, you probable want to exit after the header

Warning: Cloaking can get you in trouble with search providers. This article explains why.

查看更多
登录 后发表回答