It seems to me that a org.apache.hadoop.io.serializer.Serialization
could be written to serialize the java types directly in the same format that the wrapper classes serialize the type into. That way the Mappers and Reducers don't have to deal with the wrapper classes.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
There is nothing stopping you changing the serialization to use a different mechanism such as java Serializable interface or something like thrift, protocol buffers etc.
In fact, Hadoop comes with an (experimental) Serialization implementation for Java Serializable objects - just configure the serialization factory to use it. The default serialization mechanism is WritableSerialization
, but this can be changed by setting the following configuration property:
io.serializations=org.apache.hadoop.io.serializer.JavaSerialization
Bear in mind however that anything that expects a Writable (Input/Output formats, partitioners, comparators) etc will need to be replaced by versions that can be passed a Serializable
instance rather than a Writable
instance.
Some more links for the curious reader:
- http://www.tom-e-white.com/2008/07/rpc-and-serialization-with-hadoop.html
- What are the connections and differences between Hadoop Writable and java.io.serialization? - Which seems to be a similar question to what you're asking, and Tariq has a good link to a thread in which Doug Cutting explains the rationale behind using Writables over Serializables