What's the key differences in existent approac

2019-05-10 23:30发布

问题:

Kafka MirrorMaker is a basic approach to mirror Kafka topics from source to target brokers. Unfortunately, it doesn't fit my requirements to be configurable enough.

My requirements are very simple:

  • the solution should be JVM application
  • if destination topic doesn't exist, creates it
  • solution should have the ability to add prefixes/suffixes to destination topic names
  • it should reload and apply configurations on the fly if they're changed

According to this answer there are several alternative solutions to do this stuff:

  • MirrorTool-for-Kafka-Connect
  • Salesforce Mirus (based on Kafka Connect API)
  • Confluent's Replicator
  • Build my own application (based on Kafka Streams functionality)

Moreover, KIP-382 was created to make Mirror Maker more flexible and configurable.

So, my question is what's the killer features of these solutions (comparing to others) and finally what's the better one according to provided requirements.

回答1:

I see you are referring to my comment there...

As for your bullets

the solution should be JVM application

All the listed ones are Java based

if destination topic doesn't exist, creates it

This is dependent on the Kafka broker version supporting the AdminClient API. Otherwise, as the MirrorMaker documentation says, you should create the destination topic before mirroring, otherwise you either get (1) denied to produce because auto topic creation is disable (2) problems seeing "consistent" data because a default configured topic was created.

That being said, by default, MirrorMaker doesn't "propogate" topic configurations on its own. When I looked, MirrorTool similarly did not. I have not looked throughly at Mirus, but seems only partition amounts are preserved

Confluent Replicator does copy configurations, partitions, and it will use the AdminClient.

Replicator, MirrorTool, and Mirus are all based on Kafka Connect API. And KIP-382 will be as well

Build my own application (based on Kafka Streams functionality

Kafka Streams can only communicate from() and to() a single cluster.

You might as well just use MirrorMaker because it's a wrapper around Producer/Consumer already, and supports one cluster to another. If you need custom features, that's what the MessageHandler interface is for.

At a higher level, the Connect API is also fairly configurable, and the MirrorTool source code I find really easy to understand.

solution should have the ability to add prefixes/suffixes to destination topic names

Each one can do that, but MirrorMaker requires extra/custom code. See example by @gwenshap

reload and apply configurations on the fly if they're changed

That's the tricky one... Usually, you just bounce the Java process because most configurations are only loaded at startup. The exception being whitelist or topics.regex for finding new topics to consume.

KIP-382

Hard to say that'll be accepted. While it is thoughly written, and I personally think it's reasonably scoped, it somewhat defeats the purpose of having Replicator for Confluent. with large majority of Kafka commits and support happening out of Confluent, it's a conflict of interest

Having used Replicator, it has a few extra features that allow for consumer failover in the case of data center failure, so it's still valuable until someone reverse engineers those Kafka API calls into other solutions

MirrorTool had a KIP too, but it was seemingly rejected on the mailing list with the explanation of "Kafka Connect is a pluggable ecosystem, and anyone can go ahead and install this mirroring extension, but it shouldn't be part of the core Kafka Connect project", or at least that's how I read it.


What's "better" is a matter of opinion, and there are still other options (Apache Nifi or Streamsets come to mind). Even using kafkacat and netcat you can hack together cluster syncing.

If you are paying for an enterprise license, mostly for support, then you might as well use Replicator.

One thing that might be important to point out with MirrorMaker I discovered, that if you are mirroring a topic that is not using the DefaultPartitioner, then data will be reshuffled into the DefaultPartitioner on the destination cluster if you don't otherwise configure the destination Kafka producer to use the same partition value or partitioner class as the source Kafka Producer.