Convert from PCollection<TableRow> to PCollection<

Convert from PCollection to PCollection<

2019-09-15 17:37发布

I'm trying to extract data from 2 tables in BigQuery, then join it by CoGroupByKey. Although the output of BigQuery is PCollection<TableRow>, CoGroupByKey requires PCollection<KV<K,V>>. How can I convert from PCollection<TableRow> to PCollection<KV<K,V>>?

标签： google-cloud-dataflow

1条回答

SAY GOODBYE

2楼-- · 2019-09-15 18:09

CoGroupByKey needs to know which key to CoGroup by - this is the K in KV<K, V>, and the V is the value associated with this key in this collection. The result of co-grouping several collections will give you, for each key, all of the values with this key in each collection.

So, you need to convert both of your PCollection<TableRow> to PCollection<KV<YourKey, TableRow>> where YourKey is the type of key on which you want to join them, e.g. in your case perhaps it might be String, or Integer, or something else.

The best transform to do the conversion is probably WithKeys. E.g. here's a code sample converting a PCollection<TableRow> to a PCollection<KV<String, TableRow>> keyed by a hypothetical userId field of type String:

PCollection<TableRow> rows = ...;
PCollection<KV<String, TableRow>> rowsKeyedByUser = rows
    .apply(WithKeys.of(new SerializableFunction<TableRow, String>() {
  @Override
  public String apply(TableRow row) {
    return (String)row.get("userId");
  }
}));

0人赞添加讨论(0) 举报

Convert from PCollection to PCollection<

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间