I would like to write the Google dataflow pipeline results into multiple sinks.
Like, I want to write the result using TextIO into Google Cloud Storage as well write the results as a table in BigQuery. How can I do that?
I would like to write the Google dataflow pipeline results into multiple sinks.
Like, I want to write the result using TextIO into Google Cloud Storage as well write the results as a table in BigQuery. How can I do that?
The structure of a Cloud Dataflow pipeline is a DAG (directed acyclic graph) and it is allowed to apply multiple transforms to the same PCollection - write transforms are not an exception. You can apply multiple write transforms to the PCollection of your results, for example:
PCollection<Foo> results = p.apply(TextIO.Read.named("ReadFromGCS").from("gs://..."))
.apply(...the rest of your pipeline...);
results.apply(TextIO.Write.named("WriteToGCS").to("gs://..."));
results.apply(BigQueryIO.Write.named("WriteToBigQuery").to(...)...);