This is a two part question.
Q1: Can cURL based request 100% imitate a browser based request?
Q2: If yes, what all options should be set. If not what extra does the browser do that cannot bee imitated by cURL?
I have a website and I see thousands of request being made from a single IP in a very short time. These requests harvest all my data. When looked at the log to identify the agent used, it looks like a request from browser. So was curious to know if its a bot and not a user.
Thanks in advance
R1 : I suppose, if you set all the correct headers, that, yes, a curl-based request can imitate a browser-based one : after all, both send an HTTP request, which is just a couple of lines of text following a specific convention (namely, the HTTP RFC)
R2 : The best way to answer that question is to take a look at what your browser is sending ; with Firefox, for instance, you can use either Firebug or LiveHTTPHeaders to get that.
For instance, to get this page, Firefox sent those request headers :
GET /questions/1926876/can-a-curl-based-http-request-imitate-a-browser-based-request-completely HTTP/1.1
Host: stackoverflow.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.2b4) Gecko/20091124 Firefox/3.6b4
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://stackoverflow.com/questions/1926876/can-a-curl-based-http-request-imitate-a-browser-based-request-completely/1926889
Cookie: .......
Cache-Control: max-age=0
(I Just removed a couple of informations -- but you get the idea ;-) )
Using curl, you can work with curl_setopt
to set the HTTP headers ; here, you'd probably have to use a combination of CURLOPT_HTTPHEADER
, CURLOPT_COOKIE
, CURLOPT_USERAGENT
, ...
This page has all the answers to your questions. You can imitate the things mostly.