Dataflow streaming job not scaleing past 1 worker

My streaming dataflow job(2017-09-08_03_55_43-9675407418829265662) using Apache Beam SDK for Java 2.1.0 will not scale past 1 Worker even with a growing pubsub queue (now 100k Undelivered messages) – do you have any ideas why?

Its currently running with autoscalingAlgorithm=THROUGHPUT_BASED and maxNumWorkers=10.

标签： java google-cloud-dataflow apache-beam

2条回答

▲ chillily

2楼-- · 2019-07-22 15:07

Dataflow Engineer here. I looked up the job in our backend and I can see that it is not scaling up because CPU utilization is low, meaning something else is limiting the performance of the pipeline, such as external throttling. Upscaling rarely helps in these cases.

I see that some bundles are taking up to hours to process. I recommend investigating your pipeline logic and see if there are other parts that can be optimized.

0人赞添加讨论(0) 举报

你好瞎i

3楼-- · 2019-07-22 15:10

This is what I ended up with:

import org.apache.beam.sdk.transforms.*;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;

import java.util.concurrent.ThreadLocalRandom;


public class ReshuffleWithRandomKey<T>
        extends PTransform<PCollection<T>, PCollection<T>> {

    private final int size;

    public ReshuffleWithRandomKey(int size) {
        this.size = size;
    }

    @Override
    public PCollection<T> expand(PCollection<T> input) {
        return input
                .apply("Random key", ParDo.of(new AssignRandomKeyFn<T>(size)))
                .apply("Reshuffle", Reshuffle.<Integer, T>of())
                .apply("Values", Values.<T>create());
    }

    private static class AssignRandomKeyFn<T> extends DoFn<T, KV<Integer, T>> {

        private final int size;

        AssignRandomKeyFn(int size) {
            this.size = size;
        }

        @ProcessElement
        public void process(ProcessContext c) {
            c.output(KV.of(ThreadLocalRandom.current().nextInt(0, size), c.element()));
        }
    }
}

What do you think @raghu-angadi and @scott-wegner?

0人赞添加讨论(0) 举报

Dataflow streaming job not scaleing past 1 worker

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间