Is there a way to continuously pipe data from Azur

2019-06-13 20:09发布

I have a bunch of files in Azure Blob storage and it's constantly getting new ones. I was wondering if there is a way for me to first take all the data I have in Blob and move it over to BigQuery and then keep a script or some job running so that all new data in there gets sent over to BigQuery?

2条回答
forever°为你锁心
2楼-- · 2019-06-13 20:35

BigQuery offers support for querying data directly from these external data sources: Google Cloud Bigtable, Google Cloud Storage, Google Drive. Not include Azure Blob storage. As Adam Lydick mentioned, as a workaround, you could copy data/files from Azure Blob storage to Google Cloud Storage (or other BigQuery-support external data sources).

To copy data from Azure Blob storage to Google Cloud Storage, you can run WebJobs (or Azure Functions), and BlobTriggerred WebJob can trigger a function when a blob is created or updated, in WebJob function you can access the blob content and write/upload it to Google Cloud Storage.

Note: we can install this library: Google.Cloud.Storage to make common operations in client code. And this blog explained how to use Google.Cloud.Storage sdk in Azure Functions.

查看更多
祖国的老花朵
3楼-- · 2019-06-13 20:36

I'm not aware of anything out-of-the-box (on Google's infrastructure) that can accomplish this.

I'd probably set up a tiny VM to:

  • Scan your Azure blob storage looking for new content.
  • Copy new content into GCS (or local disk).
  • Kick off a LOAD job periodically to add the new data to BigQuery.

If you used GCS instead of Azure Blob Storage, you could eliminate the VM and just have a Cloud Function that is triggered on new items being added to your GCS bucket (assuming your blob is in a form that BigQuery knows how to read). I presume this is part of an existing solution that you'd prefer not to modify though.

查看更多
登录 后发表回答