Grabbing text from webpage and storing as variable

2019-05-10 14:43发布

问题:

On the webpage

http://services.runescape.com/m=itemdb_rs/Armadyl_chaps/viewitem.ws?obj=19463

It lists prices for a particular item in a game, I wanted to grab the "Current guide price:" of said item, and store it as a variable so I could output it in a google spreadsheet. I only want the number, currently it is "643.8k", but I am not sure how to grab specific text like that.

Since the number is in "k" form, that means I can't graph it, It would have to be something like 643,800 to make it graphable. I have a formula for it, and my second question would be to know if it's possible to use a formula on the number pulled, then store that as the final output?

-EDIT-

This is what I have so far and it's not working not sure why.

function pullRuneScape() {

var page = UrlFetchApp.fetch("http://services.runescape.com/m=itemdb_rs/Armadyl_chaps/viewitem.ws?obj=19463").getContentText();

  var number = page.match(/Current guide price:<\/th>\n(\d*)/)[1];

  SpreadsheetApp.getActive().getSheetByName('RuneScape').appendRow([new Date(), number]);

}

回答1:

Your regex is wrong. I tested this one successfully:

var number = page.match(/Current guide price:<\/th>\s*<td>([^<]*)<\/td>/m)[1];

What it does:

  1. Current guide price:<\/th> find Current guide price: and closing td tag
  2. \s*<td> allow whitespace between tags, find opening td tag
  3. ([^<]*) build a group and match everything except this char <
  4. <\/td> match the closing td tag
  5. /m match multiline


回答2:

Use UrlFetch to get the page [1]. That'll return an HTTPResponse that you can read with GetBlob [2]. Once you have the text you can use regular expressions. In this case just search for 'Current guide price:' and then read the next row. As to remove the 'k' you can just replace with reg ex like this:

'123k'.replace(/k/g,'')

Will return just '123'.

  1. https://developers.google.com/apps-script/reference/url-fetch/
  2. https://developers.google.com/apps-script/reference/url-fetch/http-response


回答3:

Obviously, you are not getting anything because the regexp is wrong. I'm no regexp expert but I was able to extract the number using basic string manipulation

  var page = UrlFetchApp.fetch("http://services.runescape.com/m=itemdb_rs/Armadyl_chaps/viewitem.ws?obj=19463").getContentText();

  var TD = "<td>";
  var start = page.indexOf('Current guide price');
  start = page.indexOf(TD, start); 
  var end = page.indexOf('</td>',start);
  var number = page.substring (start + TD.length , end);

  Logger.log(number);

Then, I wrote a function to convert k,m etc. to the corresponding multiplying factors.

function getMultiplyingFactor(symbol){
  switch(symbol){
    case 'k':
    case 'K':
      return 1000;
    case 'm':
    case 'M':
      return 1000 * 1000;
    case 'g':
    case 'G':
      return 1000 * 1000 * 1000;
    default:
      return 1;
  }
}

Finally, tie the two together

function pullRuneScape() {

var page = UrlFetchApp.fetch("http://services.runescape.com/m=itemdb_rs/Armadyl_chaps/viewitem.ws?obj=19463").getContentText();

  var TD = "<td>";
  var start = page.indexOf('Current guide price');
  start = page.indexOf(TD, start); 
  var end = page.indexOf('</td>',start);
  var number = page.substring (start + TD.length , end);

  Logger.log(number);

  var numericPart = number.substring(0, number.length -1); 
  var multiplierSymbol = number.substring(number.length -1 , number.length);
  var multiplier = getMultiplyingFactor(multiplierSymbol);
  var fullNumber = multiplier == 1 ? number : numericPart * multiplier;
  Logger.log(fullNumber);
}

Certainly, not the optimal way of doing things but it works.



回答4:

Basically I parse the html page as you did (with corrected regex) and split the string into number part and multiplicator (k = 1000). Finally I return the extracted number. This function can be used in Google Docs.

function pullRuneScape() {
  var pageContent = UrlFetchApp.fetch("http://services.runescape.com/m=itemdb_rs/Armadyl_chaps/viewitem.ws?obj=19463").getContentText();
  var matched = pageContent.match(/Current guide price:<.th>\n<td>(\d+\.*\d*)([k]{0,1})/);

  var numberAsString = matched[1];
  var multiplier = "";

  if (matched.length == 3) {
    multiplier = matched[2];
  }

  number = convertNumber(numberAsString, multiplier);
  return number;
}

function convertNumber(numberAsString, multiplier) {
  var number = Number(numberAsString);

  if (multiplier == 'k') {
    number *= 1000; 
  }

  return number;
}