“The Dataflow appears to be stuck”

2019-07-19 22:50发布

I am running a dataflow that I last ran a few months ago. From the same client, with same dataflow version (0.7.0dev0). Unfortunately it fails in mysterious ways that it did not do before.

I am starting the job, and the first stage is:

(8733429d016bc2fb): Executing operation read from datastore/Split Query+read from datastore/GroupByKey/Reify+read from datastore/GroupByKey/Write

But it gives the following error after 1 hour:

(e88cb3c076926976): Workflow failed. Causes: (e88cb3c07692626f): The Dataflow appears to be stuck. Please reach out to the Dataflow team at http://stackoverflow.com/questions/tagged/google-cloud-dataflow.

if would help, JobID is 2017-08-21_00_30_03-3588685705436948852. I would upgrade to a newer version of the library, but that involves a bunch more API changes and figuring out how to get all the pieces working again. So I'm working at it now. I was hoping that "a simple use case that previously worked and currently fails" might be easier to debug than changing even-more-things.

I'm not sure how to debug or investigate further. It worked a few months ago with the same code, but doesn't work now (with a 4-5x larger dataset, of 200-300K records, nothing crazy...)

1条回答
再贱就再见
2楼-- · 2019-07-19 23:29

This was fixed by upgrading to 2.0.0 (thanks Ben Chambers!) It seems that 0.7.0 no longer worked well with cloud dataflow.

查看更多
登录 后发表回答