I was doing a performance benchmarking of dataflow batch loads and found that the loads were just too slow when compared against the same loads on Bigquery command line tool.
The file size was around 20 MB with millions of records. I tried different machine types and got the best load performance on n1-highmem-4
with the approx load time of 8 minutes in loading the target BQ table.
When the same table load was applied by running BQ command on the command-line utility, it hardly took 2 minutes to process and load the same volume of data. Any insights about this poor load performance using Dataflow jobs? How to improve the performance to make it comparable to BQ command line utility?