I Have a PCollection of Object that I get from pubsub, let say :
PCollection<Student> pStudent ;
and in student attributes, there is an attribute let say studentID; and I want to read attributes (class_code) from BigQuery with this student id and set the class_code that I get from BQ to student Object in PCollcetion
is there anyone know how to implement this?
I know that in beam there is a BigQueryIO
but how can I do that, if the query string criteria that I want to execute in BQ is from student object (studentID) in PCollection and How can I set the value to PCollection from the result of BigQuery?
I considered two options to do this. One would be using
BigQueryIO
to read the whole table and materialize it as a side input or useCoGroupByKey
to join all the data. Another possibility, the one I implemented herein, is to use the Java Client Library directly.I created some dummy data with:
which looks like this:
and then, within the pipeline, I generate some input dummy data:
For each one of these "students" I fetch the corresponding grade in the BigQuery table following the approach in this example. Take into account, depending on your data volume, rate (quotas) and cost considerations as per the previous comment. Full example:
And the output is:
(Tested with BigQuery 1.22.0 and 2.5.0 Java SDK for Dataflow)