Can someone post a simple example of subclassing FileBasedSource? I'm new to Google Dataflow and very inexperienced with Java. My goal is to read files while including line numbers as a key, or to skip lines based on the line number.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
The implementation of XMLSource is a good starting point for understanding how FileBasedSource works. You'll likely want something like this for your reader (where readNextLine() reads to the end of a line and updates the offset):
protected void startReading(ReadableByteChannel channel) throws IOException {
if (getCurrentSource().getMode() == FileBasedSource.Mode.SINGLE_FILE_OR_SUBRANGE) {
// If we are not at the beginning of a line, we should ignore the current line.
if (getCurrentSource().getStartOffset() > 0) {
SeekableByteChannel seekChannel = (SeekableByteChannel) channel;
// Start from one character back and read till we find a new line.
seekChannel.position(seekChannel.position() - 1);
nextOffset = seekChannel.position() + readNextLine(new ByteArrayOutputStream());
}
}
}
I've created a gist with the complete LineIO example, which may be simpler than XMLSource.