We have an automated FTP process set up which imports a data file into Google Cloud Storage daily.
I would like to set up a daily automated job that uploads this csv into a bigquery table.
What is the best way to do this? My current first thought is to set up an app engine instance with a cron job that runs a python script every day. Is there a better solution?
Background Cloud Function
with a Cloud Storage trigger
is your best choice!
You can set it to monitor specific bucket for new files and execute load script whenever trigger is fired
Forgot to mention - Cloud Functions support (as of now) only node.js for scripting - which usually not a problem but just wanted to mention :o)
Similar approach to Mikhail's answer - but with more code:
With Google Cloud Functions you can automate BigQuery each time you receive a new file:
- Create a new function at https://console.cloud.google.com/functions/add
- Point "bucket" to the one receiving files.
Codewise, import BigQuery inside package.json
:
{
"dependencies": {
"@google-cloud/bigquery": "^0.9.6"
}
}
And on index.js
you can act on the new file in any appropriate way:
var bigQuery = BigQuery({ projectId: 'your-project-id' });
exports.processFile = (event, callback) => {
console.log('Processing: ' + JSON.stringify(event.data));
query(event.data);
callback();
};
var BigQuery = require('@google-cloud/bigquery');
function query(data) {
const filename = data.name.split('/').pop();
const full_filename = `gs://${data.bucket}/${data.name}`
// if you want to run a query:
query = '...'
bigQuery.query({
query: query,
useLegacySql: false
});
};