Specifying dynamically generated table name based

2019-02-21 05:29发布

I would like to setup Dataflow pipeline that reads from file in a gcs bucket, and writes to bigquery table. Caveat being, table to write to should be decided based on content of the line being read from gcs file.

My question is, is this possible? If yes, can someone give me any hints as to how to accomplish this?

Also, the gcs files from where reading has to be done is dynamic. I'm using Object Change Notification Service that calls my appengine's registered endpoint whenever any file is added/removed to the bucket, alongwith added/removed file details. This is the file whose contents has to be streamed to bigquery.

Is it possible to integrate dataflow pipeline with appengine?

Lastly, is this whole setup even the best way to do?

Thanks...

标签： google-app-engine google-bigquery google-cloud-storage google-cloud-dataflow

1条回答

我命由我不由天

2楼-- · 2019-02-21 06:21

On your first question: see Writing different values to different BigQuery tables in Apache Beam

On your second question: one way to accomplish that would be to have your appengine app publish every change notification to Cloud Pubsub, and have a constantly running streaming Dataflow pipeline watching the pubsub topic and writing to BigQuery.

On your third question: yes, assuming your data representation on GCS is fixed, the rest seems like a reasonable ingestion architecture to me :)

0人赞添加讨论(0) 举报

Specifying dynamically generated table name based

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间