When submitting a dataflow job from airflow , its unable to fetch the success status of the dataflow job and keeps displaying the below error.
{gcp_dataflow_hook.py:77} INFO - Google Cloud DataFlow job not available yet..
airflow Dag
t2 = DataFlowPythonOperator(
task_id='google_dataflow',
py_file='/Users/abc/sample.py',
gcp_conn_id='connection_id',
dataflow_default_options={
"project": 'Project_id'
"runner": "DataflowRunner",
"staging_location": 'gs://Project_id/staging',
"temp_location": 'gs://Project_id/staging'
}
)
Sample.py
def run():
argv = [
'--project={0}'.format(PROJECT),
'--staging_location=gs://{0}/staging/'.format(BUCKET),
'--temp_location=gs://{0}/staging/'.format(BUCKET),
'--runner=DataflowRunner'
]
with beam.Pipeline(argv=argv) as p:
(p | 'read_bq_table' >> beam.io.Read(beam.io.BigQuerySource(
query = 'Select * from `ds.table` limit 10',
use_standard_sql=True))
if __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
run()
have read other answers in the forum, and as suggested i removed the jobnames from sample.py as well as Airflow dag , but still airflow is unable to fetch the success return code.
From airflow log when the job gets submitted to dataflow
{gcp_dataflow_hook.py:116} INFO - Running command: python /Users/abc/sample.py
--runner=DataflowRunner -- project=project_id --region=region_name -
labels=airflow-version=v1-10-0 --job_name=google_dataflow-f8a478ae
after the dataflow job is completed
{gcp_dataflow_hook.py:128} WARNING - INFO:root:Job 2018-10-26_06_07_04-
17336980599969256162 is in state JOB_STATE_DONE
{gcp_api_base_hook.py:90} INFO - Getting connection using a JSON key file.
{discovery.py:866} INFO - URL being requested: GET
https://dataflow.googleapis.com/v1b3/projects/project_id/locations/us-
central1/jobs?alt=json
{gcp_dataflow_hook.py:77} INFO - Google Cloud DataFlow job not available yet..
Not sure how to sort this out , can someone help
Dataflow Job Summary from console
Job name beamapp-user-1026130638-681570
Job ID 2018-10-26_06_07_04-17336980599969256162
Region us-central1
Job statusSucceeded
SDK version Apache Beam SDK for Python 2.7.0