How do I load a file from Cloud Storage into memor

2019-08-01 11:58发布

问题:

I have end users that are going to be uploading a csv file into a bucket which will then be loaded to BigQuery. The issue is the content of the data is unreliable. i.e. it contains fields with free text that may contain linefeeds,extra commas, invalid date formats e.t.c. e.t.c.

I have a python script that will pre-process the file and write out a new one with all errors corrected.

I need to be able to automate this into the cloud. I was thinking I could load the contents of the file (it's only small) into memory and process the records then write it back out to the Bucket. I do not want to process the file locally.

Despite extensive searching I can't find how to load a file in a bucket into memory and then write it back out again.

Can anyone help ?

回答1:

I believe what you’re looking for is Google Cloud Functions. You can set a Cloud Function to be triggered by an upload to the GCS bucket, and use your Python code in the same Cloud Function to process the .csv and upload it to BigQuery, however, please bear in mind that Python 3.7.1 support for Cloud Functions is currently in a Beta state of development.