Say, I publish and consume different type of java objects.For each I have to define own serializer implementations. How can we provide all implementations in the kafka consumer/producer properties file under the "serializer.class" property?
问题:
回答1:
We have a similar setup with different objects in different topics, but always the same object type in one topic. We use the ByteArrayDeserializer
that comes with the Java API 0.9.0.1, which means or message consumers get only ever a byte[]
as the value part of the message (we consistently use String
for the keys). The first thing the topic-specific message consumer does is to call the right deserializer to convert the byte[]
. You could use a apache commons helper class. Simple enough.
If you prefer to let the KafkaConsumer
do the deserialization for you, you can of course write your own Deserializer
. The deserialize
method you need to implement has the topic as the first argument. Use it as a key into a map that provides the necessary deserializer and off you go. My hunch is that in most cases you will just do a normal Java deserialization anyway.
The downside of the 2nd approach is that you need a common super class for all your message objects to be able to parameterize the ConsumerRecord<K,V>
properly. With the first approach, however, it is ConsumerRecord<String, byte[]>
anyway. But then you convert the byte[]
to the object you need just at the right place and need only one cast right there.
回答2:
One option is Avro. Avro lets you define record types that you can then easily serialize and deserialize.
Here's an example schema adapted from the documentation:
{"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "default": null, "type": ["null","int"]},
{"name": "favorite_color", "default": null, "type": ["null","string"]}
]
}
Avro distinguishes between so-called SpecificData
and GenericData
. With SpecificData
readers and writers, you can easily serialize and deserialize known Java objects. The downside is SpecificData
requires compile-time knowledge of the class to schema conversion.
On the other hand, GenericData
readers and writers let you deal with record types you didn't know about at compile time. While obviously very powerful, this can get kind of clumsy -- you will have to invest time coding around the rough edges.
There are other options out there -- Thrift
comes to mind -- but from what I understand, one of the major differences is Avro's ability to work with GenericData
.
Another benefit is multi-language compatibility. Avro I know has native support for a lot of languages, on a lot of platforms. The other options do too, I am sure -- probably any off the shelf option is going to be better than rolling your own in terms of multi-language support, it's just a matter of degrees.