阿帕奇梁 - 跳步管道(Apache Beam - skip pipeline step)

2019-11-05 08:53发布

我使用的Apache梁建立一个管道组成的2个主要步骤：

使用光束变换变换数据
加载变换数据至BigQuery

该管道设置是这样的：

myPCollection = (org.apache.beam.sdk.values.PCollection<myCollectionObjectType>)myInputPCollection
                .apply("do a parallel transform"),
                     ParDo.of(new MyTransformClassName.MyTransformFn()));

 myPCollection
    .apply("Load BigQuery data for PCollection",
            BigQueryIO.<myCollectionObjectType>write()
            .to(new MyDataLoadClass.MyFactTableDestination(myDestination))
            .withFormatFunction(new MyDataLoadClass.MySerializationFn())

我看这个问题：

阿帕奇梁：跳过一个已经构建流水线步骤

这表明可能是我能够以某种方式动态地改变其输出I可以将数据传递给中，在步骤1中的平行变换。

我该怎么做呢？我不知道该怎么选择是否要通过myPCollection从步骤1到步骤2。我需要跳过步骤2如果对象myPCollection从第1步是null 。

Answer 1:

你只是不发出从你的元素MyTransformClassName.MyTransformFn当你不希望它在下一步，例如是这样的：

class MyTransformClassName.MyTransformFn extends...
  @ProcessElement
  public void processElement(ProcessContext c, ...) {
    ...
    result = ...
    if (result != null) {
       c.output(result);   //only output something that's not null
    }
  }

这样，空值没有达到下一个步骤。

见ParDo引导部分获取更多详细信息： https://beam.apache.org/documentation/programming-guide/#pardo

文章来源: Apache Beam - skip pipeline step