Specifying dynamically generated table name based

2019-02-21 05:29发布

I would like to setup Dataflow pipeline that reads from file in a gcs bucket, and writes to bigquery table. Caveat being, table to write to should be decided based on content of the line being read from gcs file.

My question is, is this possible? If yes, can someone give me any hints as to how to accomplish this?

Also, the gcs files from where reading has to be done is dynamic. I'm using Object Change Notification Service that calls my appengine's registered endpoint whenever any file is added/removed to the bucket, alongwith added/removed file details. This is the file whose contents has to be streamed to bigquery.

Is it possible to integrate dataflow pipeline with appengine?

Lastly, is this whole setup even the best way to do?

Thanks...

1条回答
我命由我不由天
2楼-- · 2019-02-21 06:21

On your first question: see Writing different values to different BigQuery tables in Apache Beam

On your second question: one way to accomplish that would be to have your appengine app publish every change notification to Cloud Pubsub, and have a constantly running streaming Dataflow pipeline watching the pubsub topic and writing to BigQuery.

On your third question: yes, assuming your data representation on GCS is fixed, the rest seems like a reasonable ingestion architecture to me :)

查看更多
登录 后发表回答