Google Dataflow hangs with no logs

2019-06-24 01:13发布

问题:

When I run example WordCount job from Dataflow docs with *DataflowPipelineRunner, it launches workers and then just hangs with state Running.

Last two status messages:

Jan 29, 2016, 22:05:50
S02: (b959a12901787f4d): Executing operation ReadLines+WordCount.CountWords/ParDo(ExtractWords)+WordCount.CountWords/Count.PerElement/Init+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey+WordCount.CountWords/Count.PerElement/Count.PerKey/Combine.GroupedValues/Partial+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey/Reify+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey/Write

Jan 29, 2016, 22:06:42
(c3fc1276c0229a41): Workers have started successfully.

and that's it. When I click "Worker logs", it's completely empty. It stays like this for at least 20 minutes.

It works fine with DirectPipelineRunner (completes within seconds and creates output file on my gs://...).

What should I look at?

Command-line parameters:

--project=my-project
--stagingLocation=gs://my-project/dataflow/staging

回答1:

A common cause of no logs showing up is that the Cloud Logging API hasn't been enabled. If all of the APIs listed in the getting started guide are not enabled, then it could lead to both problems you described (no logging & hanging workers).

Try walking through through the getting started guide again and enabling all the relevant APIs.



回答2:

If all API's enabled, check once your user auth.

glcoud auth login

and

gcloud auth application-default login

Also, ensure you have run those command with the user has project owner or editor access.

Else you can use the service account with your job as below import os os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '<creds.json>'