一个如何检测服务器/脚本通过卷曲/访问的file_get_contents他们的网站()? (不

2019-07-29 17:22发布

我已经遇到其中用户具有通过脚本访问的图像的困难问题(使用cURL / file_get_contents() ):

如何使用PHP来保存从URL的形象?

图像链接似乎使用时返回403错误file_get_contents()要求它。 但是,在卷曲,则返回一个更详细的错误:

你被拒绝进入该系统。 关闭发动机或冲浪代理,伪造IP,如果你真的想访问。 代理服务器或者不从任何Web工具,入侵防御系统接受。

平明在线数据服务@ 2008 - 2012

我也没有用卷曲要求自己摆弄左右后访问相同的图像。 我试图改变用户代理,以我的确切浏览器的用户代理可以成功访问图像。 我也试过我个人的本地服务器,它(显然)使用相同的IP地址作为我的浏览器上的脚本...所以据我所知,用户代理和IP地址是出了状况。

有人还能如何检测执行请求的脚本?

顺便说一句,这不是什么疯狂。 我只是好奇的xD

Answer 1:

It is indeed a cookie that is set by JavaScript then a redirect, to the original image. The problem is that curl/fgc wont parse the html and set the cookie its only cookies set by the server that curl will store in its cookie jar.

This is the code you get before the redirect, it makes a cookie via JavaScript with no name but location.href as the value:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<HEAD>
<TITLE>http://phim.xixam.com/thumb/giotdang.jpeg</TITLE>
<meta http-equiv="Refresh" content="0;url=http://phim.xixam.com/thumb/giotdang.jpeg">
</HEAD>
<script type="text/javascript">
window.onload = function checknow() {
var today = new Date();
var expires = 3600000*1*1;
var expires_date = new Date(today.getTime() + (expires));
var ua = navigator.userAgent.toLowerCase();
if ( ua.indexOf( "safari" ) != -1 ) { document.cookie = "location.href"; } else { document.cookie = "location.href;expires=" + expires_date.toGMTString(); }
}
</script>
<BODY>
</BODY></HTML>

But all is not lost, because by pre-setting/forging the cookie you can circumvent this security measure (a reason why using cookies for any kind of security is bad).

cookie.txt

# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.

phim.xixam.com  FALSE   /thumb/ FALSE   1338867990      location.href

So the finnished curl script would look something like:

<?php
function curl_get($url){
    $return = '';
    (function_exists('curl_init')) ? '' : die('cURL Must be installed!');

    //Forge the cookie
    $expire = time()+3600000*1*1;
    $cookie =<<<COOKIE
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.

phim.xixam.com  FALSE   /thumb/ FALSE   $expire     location.href

COOKIE;
    file_put_contents(dirname(__FILE__).'/cookie.txt',$cookie);

    //Browser Masquerade cURL request
    $curl = curl_init();
    $header[0] = "Accept: text/xml,application/xml,application/json,application/xhtml+xml,";
    $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: ";

    curl_setopt($curl, CURLOPT_COOKIEJAR, dirname(__FILE__).'/cookie.txt');
    curl_setopt($curl, CURLOPT_COOKIEFILE, dirname(__FILE__).'/cookie.txt');
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0 Firefox/5.0');
    curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
    curl_setopt($curl, CURLOPT_HEADER, 0);
    //Pass the referer check
    curl_setopt($curl, CURLOPT_REFERER, 'http://xixam.com/forum.php');
    curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
    curl_setopt($curl, CURLOPT_AUTOREFERER, true);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($curl, CURLOPT_TIMEOUT, 30);
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);

    $html = curl_exec($curl);
    curl_close($curl);
    return $html;
}

$image = curl_get('http://phim.xixam.com/thumb/giotdang.jpeg');

file_put_contents('test.jpg',$image);
?>

The only way to stop a crawler is to log all your visitors ips in your database and increment a value based on visits per ip, then once a week or so look at the top hits by ip and then do a reverse lookup of the ip and see if its from a hosting provider if so block it at your firewall or in htaccess, other then that you cant really stop the request to a resource if its publicly available as any hurdle can be overcome.

Hope it helps.



文章来源: How can one detect if a server/script is accessing their site through cURL/file_get_contents()? (excluding user-agents and IP addresses)