I need to setup a data pipeline from some source databases like Oracle, MySQL and load the data to BigQuery.
How can I use google-cloud-dataflow to read data from a database(jdbc connection) and write to BigQuery tables using Python.
Also, I have some hive tables in an on-premise Hadoop cluster, how do I transfer this data to BigQuery.
I couldn't find the right documentation or examples to achieve this. Can you please point me in the right direction.
I applied a solution in my project to provide such thing, you need to follow these steps:
Load data from
Google Cloud SQL
toGoogle Cloud storage
in CSV by following this link.Load the CSV data from
Google cloud storage
directly intoBigQuery
by following this link.