Custom inputformat to process protobufs in hadoop

2019-08-07 10:43发布

问题:

I'd like to process protobufs using hadoop....but am unsure where to start. I don't care about splitting large files. The protobufs are stored as binary data...what class should I extend to make it easier

回答1:

elephant-bird can process protobufs using hadoop. This framework generates hadoop I/O classes along with regular protobuf classes. It uses lzo compression.