Custom Machine Types in Python Dataflow SDK

2019-07-27 01:57发布

问题:

According to Is it possible to use a Custom machine for Dataflow instances? you can set the custom machine type for a dataflow operation by specifying the name as custom-<number of cpus>-<memory in mb>

But that answer is for the Java Api and the old Dataflow version, not the new Apache Beam implementation and Python.

If I supply --worker_machine_type custom-8-5376in the 2.0.0 google-cloud-dataflow python API, I get the following error:

"(4092fe7df5a10577): The workflow could not be created. Please try again in a few minutes. If you are still unable to create a job please contact customer support. Causes: (4092fe7df5a10596): Unable to get machine type information for machine type custom-8-5376 in zone us-central1-f. Please check that machine type and zone are correct."

I also tried defining a new instance template in the compute engine and supplying the name of that template in the --worker_machine_type parameter, but that doesn't work, either.

How can you run a workflow on Dataflow 2.0.0 with a custom machine type?

回答1:

per custom machine type doc: https://cloud.google.com/compute/docs/instances/creating-instance-with-custom-machine-type

The memory per vCPU of a custom machine type must be between 0.9 GB and 6.5 GB per vCPU, inclusive.

So for 8 vCPUs, 7424MiB is the minimum.

Could you please try again?