Dataflow can't read from BigQuery dataset in r

2019-07-27 11:43发布

I have a BigQuery dataset located in the new "asia-northeast1" region. I'm trying to run a Dataflow templated pipeline (running in Australia region) to read a table from it. It chucks the following error, even though the dataset/table does indeed exist:

Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
{
  "code" : 404,
  "errors" : [ {
    "domain" : "global",
    "message" : "Not found: Dataset grey-sort-challenge:Konnichiwa_Tokyo",
    "reason" : "notFound"
  } ],
  "message" : "Not found: Dataset grey-sort-challenge:Konnichiwa_Tokyo"
}

Am I doing something wrong here?

/**
 * BigQuery -> ParDo -> GCS (one file)
 */
public class BigQueryTableToOneFile {
    public static void main(String[] args) throws Exception {
        PipelineOptionsFactory.register(TemplateOptions.class);
        TemplateOptions options = PipelineOptionsFactory
                .fromArgs(args)
                .withValidation()
                .as(TemplateOptions.class);
        options.setAutoscalingAlgorithm(THROUGHPUT_BASED);
        Pipeline pipeline = Pipeline.create(options);
        pipeline.apply(BigQueryIO.read().from(options.getBigQueryTableName()).withoutValidation())
                .apply(ParDo.of(new DoFn<TableRow, String>() {
                    @ProcessElement
                    public void processElement(ProcessContext c) throws Exception {
                        String commaSep = c.element().values()
                                .stream()
                                .map(cell -> cell.toString().trim())
                                .collect(Collectors.joining("\",\""));
                        c.output(commaSep);
                    }
                }))
                .apply(TextIO.write().to(options.getOutputFile())
                        .withoutSharding()
                        .withWritableByteChannelFactory(GZIP)
                );
        pipeline.run();
    }

    public interface TemplateOptions extends DataflowPipelineOptions {
        @Description("The BigQuery table to read from in the format project:dataset.table")
        @Default.String("bigquery-samples:wikipedia_benchmark.Wiki1k")
        ValueProvider<String> getBigQueryTableName();

        void setBigQueryTableName(ValueProvider<String> value);

        @Description("The name of the output file to produce in the format gs://bucket_name/filname.csv")
        @Default.String("gs://bigquery-table-to-one-file/output/bar.csv.gz")
        ValueProvider<String> getOutputFile();

        void setOutputFile(ValueProvider<String> value);
    }
}

Args:

--project=grey-sort-challenge
--runner=DataflowRunner
--jobName=bigquery-table-to-one-file
--maxNumWorkers=1
--zone=australia-southeast1-a
--stagingLocation=gs://bigquery-table-to-one-file/jars
--tempLocation=gs://bigquery-table-to-one-file/tmp
--templateLocation=gs://bigquery-table-to-one-file/template

Job id: 2018-05-05_05_37_08-8260293482986343692

enter image description here

enter image description here

1条回答
唯我独甜
2楼-- · 2019-07-27 12:05

Sorry about that issue. It will be addressed in the upcoming Beam SDK 2.5.0 (you can try using current head snapshots from the Beam repo)

查看更多
登录 后发表回答