I am running a dataflow that I last ran a few months ago. From the same client, with same dataflow version (0.7.0dev0). Unfortunately it fails in mysterious ways that it did not do before.
I am starting the job, and the first stage is:
(8733429d016bc2fb): Executing operation read from datastore/Split Query+read from datastore/GroupByKey/Reify+read from datastore/GroupByKey/Write
But it gives the following error after 1 hour:
(e88cb3c076926976): Workflow failed. Causes: (e88cb3c07692626f): The Dataflow appears to be stuck. Please reach out to the Dataflow team at http://stackoverflow.com/questions/tagged/google-cloud-dataflow.
if would help, JobID is 2017-08-21_00_30_03-3588685705436948852. I would upgrade to a newer version of the library, but that involves a bunch more API changes and figuring out how to get all the pieces working again. So I'm working at it now. I was hoping that "a simple use case that previously worked and currently fails" might be easier to debug than changing even-more-things.
I'm not sure how to debug or investigate further. It worked a few months ago with the same code, but doesn't work now (with a 4-5x larger dataset, of 200-300K records, nothing crazy...)
This was fixed by upgrading to 2.0.0 (thanks Ben Chambers!) It seems that 0.7.0 no longer worked well with cloud dataflow.