Skipping header rows - is it possible with Cloud D

2019-02-22 04:04发布

站内文章 / 后端开发

23 0

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I've created a Pipeline, which reads from a file in GCS, transforms it, and finally writes to a BQ table. The file contains a header row (fields).

Is there any way to programatically set the "number of header rows to skip" like you can do in BQ when loading in?

This is not currently possible. It sounds like there are two potential requests here:

Future work on this is tracked in https://issues.apache.org/jira/browse/BEAM-123.

Also, in the meantime, you could add a simple filter to your ParDo code to skip headers. Something like this:

PCollection<X> rows = ...;
PCollection<X> nonHeaders =
   rows.apply(Filter.by(new MatchIfNonHeader()));

标签： google-cloud-dataflow

叼着烟拽天下

女 | 书童

私信

Ta的文章更多文章

0条评论

还没有人评论过~