Pick elements in processElement() - Apache Beam

2019-08-18 20:05发布

I know that when we implement a ParDo transform, we pick up individual elements from our data(basically separated by "\n"). But what if I have an element that occupies two lines in my file. Can I apply my own condition to pick elements according to it? Or is it always necessary to have an element in a single line?

1条回答
Viruses.
2楼-- · 2019-08-18 20:52

Reading of text files is controlled by TextIO, not by ParDo - I suppose that's what you meant. Indeed right now TextIO splits files into 1 element per line, however there is work in progress on changing that. You can follow the work at https://issues.apache.org/jira/browse/BEAM-2802.

It would be useful for that work, if you told more about your file format, to make sure it is in scope.

查看更多
登录 后发表回答