I'm using cURL to get some rank data for over 20,000 domain names that I've got stored in a database.
The code I'm using is http://semlabs.co.uk/journal/object-oriented-curl-class-with-multi-threading.
The array $competeRequests is 20,000 request to compete.com api for website ranks.
This is an example request: http://apps.compete.com/sites/stackoverflow.com/trended/rank/?apikey=xxxx&start_date=201207&end_date=201208&jsonp=";
Since there are 20,000 of these requests I want to break them up into chunks so I'm using the following code to accomplish that:
foreach(array_chunk($competeRequests, 1000) as $requests) {
foreach($requests as $request) {
$curl->addSession( $request, $opts );
}
}
This works great for sending the requests in batches of 1,000 however the script takes too long to execute. I've increased the max_execution_time to over 10 minutes.
Is there a way to send 1,000 requests from my array then parse the results then output a status update then continue with the next 1,000 until the array is empty? As of now the screen just stays white the entire time the script is executing which can be over 10 minutes.
The above accepted answer is outdated, So, correct answer has to be upvoted.
http://php.net/manual/en/function.curl-multi-init.php
Now, PHP supports fetching multiple URLs at the same time.
There is a very good function written by someone, http://archevery.blogspot.in/2013/07/php-curl-multi-threading.html
You can just use it.
This one always does the job for me... https://github.com/petewarden/ParallelCurl
Put this at the top of your php script:
that would disable all caching the web server or php may be doing, making your output be displayed on the browser while the script is running.
Pay attention to comment out the
apache_setenv
line if you use nginx web server instead of apache.Update for nginx:
So OP is using nginx, that makes things a bit trickier as nginx doesn't let to disable gzip compresion from PHP. I also use nginx and I just found out I have it active by default, see:
so you need to disable gzip on nginx.conf and restart nginx:
/etc/init.d/nginx restart
or you can play with the gzip_disable or gzip_types options, to conditionally disable gzip for some browsers or for some page content-types respectively.
https://github.com/krakjoe/pthreads
You may thread in PHP, the code depicted is just horrible thread programming, and I don't advise that is how you do it, but wanted to show you the overhead of 20,000 threads ... it's 18 seconds, on my current hardware which is a Intel G620 ( dual core ) with 8gigs of ram, on server hardware you can expect much faster results ... how you thread such a task is dependant on your resources, and the resources of the service you are requesting ...