I have a large sheet with around 30 importxml
functions that obtain data from a website that updates usually twice a day.
I would like to run the importxml
function on a timely basis (every 8 hours) for my Google Spreadsheet to save the data in another sheet. The saving already works, however the updating does not!
I read in Google Spreadsheet row update that it might run every 2 hours, however I do not believe that this is true, because since I added it to my sheet nothing has changed or updated, when the spreadsheet is NOT opened.
How can I "trigger" the importxml
function in my Google Spreadsheet in an easy way, as I have a lot of importxml
functions in it?
The Google Spreadsheet row update question and its answers refer to the "Old Sheets", which had different behaviour than the 2015 version of Google Sheets does. There is no automatic refresh of content with "New Sheets"; changes are only evaluated now in response to edits.
While Sheets no longer provides this capability natively, we can use a script to refresh the "import" formulas (IMPORTXML
, IMPORTDATA
, IMPORTHTML
and IMPORTANGE
).
Utility script
For periodic refresh of IMPORT formulas, set this function up as a time-driven trigger.
Caveats:
- Import function Formula changes made to the spreadsheet by other scripts or users during the refresh period COULD BE OVERWRITTEN.
- Overlapping refreshes might make your spreadsheet unstable. To mitigate that, the utility script uses a
ScriptLock
. This may conflict with other uses of that lock in your script.
/**
* Go through all sheets in a spreadsheet, identify and remove all spreadsheet
* import functions, then replace them a while later. This causes a "refresh"
* of the "import" functions. For periodic refresh of these formulas, set this
* function up as a time-based trigger.
*
* Caution: Formula changes made to the spreadsheet by other scripts or users
* during the refresh period COULD BE OVERWRITTEN.
*
* From: https://stackoverflow.com/a/33875957/1677912
*/
function RefreshImports() {
var lock = LockService.getScriptLock();
if (!lock.tryLock(5000)) return; // Wait up to 5s for previous refresh to end.
// At this point, we are holding the lock.
var id = "YOUR-SHEET-ID";
var ss = SpreadsheetApp.openById(id);
var sheets = ss.getSheets();
for (var sheetNum=0; sheetNum<sheets.length; sheetNum++) {
var sheet = sheets[sheetNum];
var dataRange = sheet.getDataRange();
var formulas = dataRange.getFormulas();
var tempFormulas = [];
for (var row=0; row<formulas.length; row++) {
for (col=0; col<formulas[0].length; col++) {
// Blank all formulas containing any "import" function
// See https://regex101.com/r/bE7fJ6/2
var re = /.*[^a-z0-9]import(?:xml|data|feed|html|range)\(.*/gi;
if (formulas[row][col].search(re) !== -1 ) {
tempFormulas.push({row:row+1,
col:col+1,
formula:formulas[row][col]});
sheet.getRange(row+1, col+1).setFormula("");
}
}
}
// After a pause, replace the import functions
Utilities.sleep(5000);
for (var i=0; i<tempFormulas.length; i++) {
var cell = tempFormulas[i];
sheet.getRange( cell.row, cell.col ).setFormula(cell.formula)
}
// Done refresh; release the lock.
lock.releaseLock();
}
}
I made a couple of adjustments to Mogsdad's answer:
- Fixed the
releaseLock()
call placement
- Updates (or adds) a querystring parameter to the url in the import function (as opposed to storing, removing, waiting 5 seconds, and then restoring all relevant formulas)
- Works on a specific sheet in your spreadsheet
- Shows time of last update
...
function RefreshImports() {
var lock = LockService.getScriptLock();
if (!lock.tryLock(5000)) return; // Wait up to 5s for previous refresh to end.
var id = "[YOUR SPREADSHEET ID]";
var ss = SpreadsheetApp.openById(id);
var sheet = ss.getSheetByName("[SHEET NAME]");
var dataRange = sheet.getDataRange();
var formulas = dataRange.getFormulas();
var content = "";
var now = new Date();
var time = now.getTime();
var re = /.*[^a-z0-9]import(?:xml|data|feed|html|range)\(.*/gi;
var re2 = /((\?|&)(update=[0-9]*))/gi;
var re3 = /(",)/gi;
for (var row=0; row<formulas.length; row++) {
for (var col=0; col<formulas[0].length; col++) {
content = formulas[row][col];
if (content != "") {
var match = content.search(re);
if (match !== -1 ) {
// import function is used in this cell
var updatedContent = content.toString().replace(re2,"$2update=" + time);
if (updatedContent == content) {
// No querystring exists yet in url
updatedContent = content.toString().replace(re3,"?update=" + time + "$1");
}
// Update url in formula with querystring param
sheet.getRange(row+1, col+1).setFormula(updatedContent);
}
}
}
}
// Done refresh; release the lock.
lock.releaseLock();
// Show last updated time on sheet somewhere
sheet.getRange(7,2).setValue("Rates were last updated at " + now.toLocaleTimeString())
}
To answer your question for an easy "trigger" to force the function to reload:
add an additional not used parameter to the url you are loading, while referencing a cell for the value of that parameter.
Once you alter the content of that cell, the function reloads.
example:
importxml("http://www.example.com/?noop=" & $A$1,"...")
unfortunately you cannot put a date calculating function into the referenced cell, that throws an error that this is not allowed.