How do I use the Cloud Dataflow Regional Endpoint

2019-05-28 22:28发布

问题:

Is it possible to change the region of a Google Cloud Platform Dataflow job to Europe? I have set zone of the pipeline to europe-west1-d but I am unable to change to region of the job itself. I have tried to change the region in the pipeline options, but that results in an error and only the default region is working.

pipeline_options.view_as(GoogleCloudOptions).region = 'europe-west1'

"error": {
    "code": 400,
    "message": "(ff50231266257fc7): The workflow could not be created, since it was sent to an invalid or unreleased region. Please resubmit with a valid region.",
    "status": "INVALID_ARGUMENT"
  }

europe-west1 is listed when using the command gcloud compute regions list

回答1:

Yes, Cloud Dataflow Regional Endpoints allow you to change the region of a Dataflow job to Europe.

Regional Endpoints are a brand new Cloud Dataflow feature. Prior to the release of Regional Endpoints, the experimental region option could be specified but was not used. This error message appeared because the region option was being specified before the feature was released.

Examples for your case (Europe):

  • You can submit a job with only the Regional Endpoint specified, (e.g. region = europe-west1), and that job will be managed and run in the europe-west1 region; Cloud Dataflow will automatically select a zone for Dataflow workers, from this region, when you omit a zone.

  • You can also submit a job with both a Regional Endpoint and Zone specified (e.g. region = europe-west1 and zone = europe-west1d), and that job will be managed in the europe-west1 region, with Dataflow workers running in the europe-west1d zone.



回答2:

With datafkow sdk 2.1.0 you can do this.

You can use

pipelineOptions.setWorkerMachineType(pipelineConfigProperties.get("worker.machine.type"));
    pipelineOptions.setNetwork("dataflow.network");
    pipelineOptions.setUsePublicIps(false);
    pipelineOptions.setZone("dataflow.zone");
    pipelineOptions.setSubnetwork("dataflow.subnetwork");
    pipelineOptions.setRegion("dataflow.region");

This is tested and you definitely do this in 2.1.0