Web scraping with Google Apps Script

2019-01-23 21:27发布

I'm trying to pull data from the following sample web page using Google Apps Script:

url = http://www.premierleague.com/players/2064/Wayne-Rooney/stats?se=54

using, UrlFetchApp.Fetch(url)

The problem is when I use UrlFetchApp.Fetch(url) to do that, I don't get the page information defined by the 'se' parameter in the url. Instead, I get the information on the following URL because it looks like the 'se=54' page is asynchronously loaded: http://www.premierleague.com/players/2064/Wayne-Rooney/stats

Is there any way to pass the parameter 'se' some other way? I was looking at the function and it allows the specification of 'options', as they are referred to, but the documentation on the topic is very limited.

Any help would be most appreciated. Many thanks

Tommy

2条回答
Melony?
2楼-- · 2019-01-23 21:56

Please try the following solution:

var options =
{
   "method"  : "GET",   
   "followRedirects" : true,
   "muteHttpExceptions": true
};

var result = UrlFetchApp.fetch(url, options);
查看更多
你好瞎i
3楼-- · 2019-01-23 22:14

Go to that website in your browser and open the developer tools (F12 or ctr-shift-i). Click on the network tab and reload the page with F5. A list of requests will appear. At the bottom of the list you should see the asynchronous requests made to fetch the information. Those requests get the data in json form from footballapi.pulselive.com. You can do the same thing in apps script. But you have to send a correct "origin" header line or your request gets rejected. Here is an example.

function fetchData() {
  var url = "http://footballapi.pulselive.com/football/stats/player/2064?comps=1";
  var options = {
    "headers": {
      "Origin": "http://www.premierleague.com"
    }
  }
  var json = JSON.parse(UrlFetchApp.fetch(url, options).getContentText()); 
  for(var i = 0; i < json.stats.length; i++) {
    if(json.stats[i].name === "goals") Logger.log(json.stats[i]);
  }
}
查看更多
登录 后发表回答