Does SortValues transform Java SDK extension in Be

2019-01-29 06:19发布

问题:

I have tried the example code of SortValues transform using DirectRunner on local machine (Windows)

PCollection<KV<String, KV<String, Integer>>> input = ...

PCollection<KV<String, Iterable<KV<String, Integer>>>> grouped =
input.apply(GroupByKey.<String, KV<String, Integer>>create());

PCollection<KV<String, Iterable<KV<String, Integer>>>> groupedAndSorted =
grouped.apply(SortValues.<String, String, Integer>create(BufferedExternalSorter.options()));

but I got the error PipelineExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable. Does this mean this transform function only works in Hadoop environment?

回答1:

As of today, if you use Beam with release version below 2.0.0, you will have to add two hadoop dependencies in your maven pom file for this SortValues module to work.

  1. add hadoop-common version 2.7.3 or later
  2. add hadoop-mapreduce-client-core version 2.7.3 or later.

Otherwise, you will just need to use Beam with release version >= 2.0.0.