Google Cloud Dataflow BigQueryIO.Write occur Unkno

Has somebody occur same problem with me that Google Cloud Dataflow BigQueryIO.Write happen unknown error (http code 500)?

I use Dataflow to handle some data in April, May, June, I use same code to process April data (400MB) and write to BigQuery success, but when I process May (60MB) or June (90MB) data, It was fail.

The data format in April, May and June are same.
Change writer from BigQuery to TextIO, job will success, so I think data format is good.
Log Dashboard no any error log.....
System only same unknown error

The code I wrote is here: http://pastie.org/10907947

Error Message after "Executing BigQuery import job":

Workflow failed. Causes: 
(cc846): S01:Read Files/Read+Window.Into()+AnonymousParDo+BigQueryIO.Write/DataflowPipelineRunner.BatchBigQueryIOWrite/DataflowPipelineRunner.BatchBigQueryIONativeWrite failed., 
(e19a27451b49ae8d): BigQuery import job "dataflow_job_631261" failed., (e19a745a666): BigQuery creation of import job for table "hi_event_m6" in dataset "TESTSET" in project "lib-ro-123" failed., 
(e19a2749ae3f): BigQuery execution failed., 
(e19a2745a618): Error: Message: An internal error occurred and the request could not be completed. HTTP Code: 500

标签： java google-bigquery google-cloud-platform google-cloud-dataflow

2条回答

劫难

2楼-- · 2019-05-22 13:53

Dataflow SDK for Java 1.x: as a workaround, you can enable this experiment in : --experiments=enable_custom_bigquery_sink

In Dataflow SDK for Java 2.x, this behavior is default and no experiments are necessary.

Note that in both versions, temporary files in GCS may be left over if your job fails.

Hope that helps!

0人赞添加讨论(0) 举报

我命由我不由天

3楼-- · 2019-05-22 13:57

Sorry for the frustration. Looks like you are hit a limit on the number of files being written to BQ. This is a known issue that we're in the process of fixing.

In the meantime, you can work around this issue by either decreasing the number of input files or resharding the data (do a GroupByKey and then ungroup the data -- semantically it's a no-op, but it forces the data to be materialized so that the parallelism of the write operation isn't constrained by the parallelism of the read).

0人赞添加讨论(0) 举报

Google Cloud Dataflow BigQueryIO.Write occur Unkno

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间