Can Cloud Dataflow streaming job scale to zero?

2019-04-12 02:06发布

问题:

I'm using Cloud Dataflow streaming pipelines to insert events received from Pub/Sub into a BigQuery dataset. I need a few ones to keep each job simple and easy to maintain.

My concern is about the global cost. Volume of data is not very high. And during a few periods of the day, there isn't any data (any message on pub/sub).

I would like that Dataflow scale to 0 worker, until a new message is received. But it seems that minimum worker is 1.

So minimum price for each job for a day would be : 24 vCPU Hour... so at least $50 a month/job. (without discount for monthly usage)

I plan to run and drain my jobs via api a few times per day to avoid 1 full time worker. But this does not seem to be the right form for a managed service like DataFlow.

Is there something I missed?

回答1:

Dataflow can't scale to 0 workers, but your alternatives would be to use Cron, or Cloud Functions to create a Dataflow streaming job whenever an event triggers it, and for stopping the Dataflow job by itself, you can read the answers to this question.

You can find an example here for both cases (Cron and Cloud Functions), note that Cloud Functions is not in Alpha release anymore and since July it's in General Availability release.



回答2:

A streaming Dataflow job must always have a single worker. If the volume of data is very low, perhaps batch jobs fit the use case better. Using a scheduler or cron you can periodically start a batch job to drain the topic and this will save on cost.