I would like to setup Dataflow pipeline that reads from file in a gcs bucket, and writes to bigquery table. Caveat being, table to write to should be decided based on content of the line being read from gcs file.
My question is, is this possible? If yes, can someone give me any hints as to how to accomplish this?
Also, the gcs files from where reading has to be done is dynamic. I'm using Object Change Notification Service that calls my appengine's registered endpoint whenever any file is added/removed to the bucket, alongwith added/removed file details. This is the file whose contents has to be streamed to bigquery.
Is it possible to integrate dataflow pipeline with appengine?
Lastly, is this whole setup even the best way to do?
Thanks...
On your first question: see Writing different values to different BigQuery tables in Apache Beam
On your second question: one way to accomplish that would be to have your appengine app publish every change notification to Cloud Pubsub, and have a constantly running streaming Dataflow pipeline watching the pubsub topic and writing to BigQuery.
On your third question: yes, assuming your data representation on GCS is fixed, the rest seems like a reasonable ingestion architecture to me :)