google dataflow write to bigquery table performanc

2019-03-01 06:13发布

I compared performance of processing data and output to Bigquery tables and files, difference is significant:

input: 1.5M records from about 600 files transform: construct/convert a few fields in each records, construct a key and emit key,value pairs; eventually records per each key go to one target, a file or a table;

it took 7 mins to write to 13 files, and over 60 mins write to 13 bigquery tables;

Try to understand is this expected outcome or I didn't do it right? what's the factors should be considered when write to bigquery table?

Please help, this could be show stopper for what I'm trying to do.

标签： google-cloud-dataflow

1条回答

倾城　Initia

2楼-- · 2019-03-01 06:31

For batch jobs, Dataflow imports data into BigQuery by writing it to GCS and then running BigQuery jobs to import that data into BigQuery. If you want to know how long the BigQuery jobs are taking I think can look at the BigQuery jobs run in your project.

You can try the following commands to get information about your BigQuery import jobs.

  bq ls -j <PROJECT ID>:

The above command should show you a list of jobs and things like the duration. (Note the colon at the end of project ID I think the colon is required).

You can then try

bq show -j <JOB ID>

To get additional information about the job.

Note you must be an owner of the project in order to be able to see jobs run by other users. This applies to BigQuery jobs run by Dataflow because Dataflow uses service account.

0人赞添加讨论(0) 举报

google dataflow write to bigquery table performanc

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间