Automate file upload from Google Cloud Storage to

2019-02-18 01:11发布

问题:

We have an automated FTP process set up which imports a data file into Google Cloud Storage daily.

I would like to set up a daily automated job that uploads this csv into a bigquery table.

What is the best way to do this? My current first thought is to set up an app engine instance with a cron job that runs a python script every day. Is there a better solution?

回答1:

Background Cloud Function with a Cloud Storage trigger is your best choice!

You can set it to monitor specific bucket for new files and execute load script whenever trigger is fired

Forgot to mention - Cloud Functions support (as of now) only node.js for scripting - which usually not a problem but just wanted to mention :o)



回答2:

Similar approach to Mikhail's answer - but with more code:

With Google Cloud Functions you can automate BigQuery each time you receive a new file:

  • Create a new function at https://console.cloud.google.com/functions/add
  • Point "bucket" to the one receiving files.

Codewise, import BigQuery inside package.json:

{
  "dependencies": {
    "@google-cloud/bigquery": "^0.9.6"
  } 
}

And on index.js you can act on the new file in any appropriate way:

var bigQuery = BigQuery({ projectId: 'your-project-id' });

exports.processFile = (event, callback) => {
  console.log('Processing: ' + JSON.stringify(event.data));
  query(event.data);
  callback();
};

var BigQuery = require('@google-cloud/bigquery');

function query(data) {
    const filename = data.name.split('/').pop();
    const full_filename = `gs://${data.bucket}/${data.name}`

    // if you want to run a query:
    query = '...'
    bigQuery.query({
        query: query,
        useLegacySql: false
    });
};