Skipping header rows - is it possible with Cloud D

2019-02-22 03:16发布

I've created a Pipeline, which reads from a file in GCS, transforms it, and finally writes to a BQ table. The file contains a header row (fields).

Is there any way to programatically set the "number of header rows to skip" like you can do in BQ when loading in?

number of header rows to skip

标签： google-cloud-dataflow

1条回答

2楼-- · 2019-02-22 04:03

This is not currently possible. It sounds like there are two potential requests here:

Also, in the meantime, you could add a simple filter to your ParDo code to skip headers. Something like this:

PCollection<X> rows = ...;
PCollection<X> nonHeaders =
   rows.apply(Filter.by(new MatchIfNonHeader()));

0人赞添加讨论(0) 举报