Preferred technique for 'pacing' HTTP requ

I'm trying to "spider" a small set of data from a single site using TamperMonkey/Javascript/jQuery and collate it on to a single page.

I've written a TM script (which fires when I open a target page) to do the following:

Search the page for links of a certain type (typically around 8 links)
"Follow" each link found to a new page, locate and follow a single link from there
Extract the data I'm interested in and "incorporate" it into the original page I opened.

Iterating through these actions typically results in 16 (8 * 2 Links) HTTP requests being fired at the site. The code I've written works fine if I manually call it (via console) to perform the actions in a single step manner for all 16 pieces of data.

However if I try and set a loop up and let the code just "do it's thing" I get The page you requested isn't responding type HTML back (Status=OK) after about 4 iterations. I'm guessing the site is protecting itself against some sort of XSRF attack or is just genuinely slow?

My question is what would be the preferred technique to lower the rate at which I'm requesting data from the site? I've considered building an array of HTTP function calls or URLs to process, but this seems clunky, is there anything more idiomatic available to me?

I'm guessing this must be such a common problem and solid solutions exist for it, but I just don't have a good enough grip on terminology to search properly for it.

标签： javascript ajax csrf web-crawler

1条回答

We Are One

2楼-- · 2019-09-02 00:59

Similar answer I posted on the other question: Browser stops working for a while after synchronous ajax call in a for loop

You can use a "recursive" function to help you control flow with asynchronous calls. Instead of running then synchronously, you can run them all asynchronously and the function when it is time for the next one.

Something like:

function doCall() {
    setTimeout(function() {
        $.ajax({
            //...
            succcess: function(data) {
                //...
                //time to start the next one
                doCall();
            },
            error: function() {
                //call the next one on error?
                doCallI();
            }
        });
    }, 1000); //1 second wait before each run
}

This way they run async, don't block everything while they are calling; but still run in series. You can even put a small delay within the doCall function so there is some space.

0人赞添加讨论(0) 举报

Preferred technique for 'pacing' HTTP requ

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间