I have been using apache beam python sdk using google cloud dataflow service for quite some time now.
I was setting dataflow up for a new project.
The dataflow pipeline
- Reads data from google datastore
- Processes it
- Writes to Google Big-Query.
I have similar pipelines running on other projects which are running perfectly fine.
Today, When I started a dataflow job, the pipeline started, read data from datastore, processed it and when it was about to write it to bigquery, It resulted in
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException:
Dataflow pipeline failed. State: FAILED, Error:
Workflow failed. Causes: S04:read from datastore/GroupByKey/Read+read
from datastore/GroupByKey/GroupByWindow+read from datastore/Values+read
from datastore/Flatten+read from datastore/Read+convert to table
rows+write to bq/NativeWrite failed., BigQuery import job
"dataflow_job_8287310405217525944" failed., BigQuery creation of import
job for table "TableABC" in dataset "DatasetABC" in project "devel-
project-abc" failed., BigQuery execution failed., Error:
Message: Access Denied: Dataset devel-project-abc:DatasetABC: The user
service-account-number-compute@developer.gserviceaccount.com does not
have bigquery.tables.create permission for dataset devel-project-
abc:DatasetABC: HTTP Code: 403
I made sure all the required API are enabled. According to me the service account has the necessary permission.
My question is Where this might be going wrong?
Update
From what I remember on previous projects (3 different projects to be precise) I didn't give the dataflow service agent any specific permission. The compute engine service agent had permissions like dataflow admin, editor, dataflow viewer. Hence before proceeding with giving the service agent permissions related to bigquery, i would like to know why the environment is behaving differently than the previous projects.
Is there any permission/policy changes/updates that went live in last few months resulting in requirement of bigquery writer permission?