Is Apache Kafka appropriate for use as an unordere

2020-02-16 06:55发布

问题:

Kafka splits incoming messages up into partitions, according to the partition assigned by the producer. Messages from partitions then get consumed by consumers in different consumer groups.

This architecture makes me wary of using Kafka as a work/task queue, because I have to specify the partition at time of production, which indirectly limits which consumers can work on it because a partition is sent to only one consumer in a consumer group. I would rather not specify the partition ahead of time, so that whichever consumer is available to take that task can do so. Is there a way to structure partitions/producers in a Kafka architecture where tasks can be pulled by the next available consumer, without having to split up work ahead of time by choosing a partition when the work is produced?

Using only one partition for this topic would put all the tasks in the same queue, but then the number of consumers is limited to 1 per consumer group, so each consumer would have to be in a different group. Then all of the task get distributed to each consumer group, though, which is not the kind of work queue I'm looking for.

Is Apache Kafka appropriate for use as a task queue?

回答1:

Using Kafka for a task queue is a bad idea. Use RabbitMQ instead, it does it much better and more elegantly.

Although you can use Kafka for a task queue - you will get some issues: Kafka is not allowing to consume a single partition by many consumers (by design), so if for example a single partition gets filled with many tasks and the consumer who owns the partition is busy, the tasks in that partition will get "starvation". This also means that the order of consumption of tasks in the topic will not be identical to the order which the tasks were produced which might cause serious problems if the tasks needs to be consumed in a specific order (in Kafka to fully achieve that you must have only one consumer and one partition - which means serial consumption by just one node. If you have multiple consumers and multiple partitions the order of tasks consumption will not be guaranteed in the topic level).

In fact - Kafka topics are not queues in the computer science manner. Queue means First in First out - this is not what you get in Kafka in the topic level.

Another issue is that it is difficult to change the number of partitions dynamically. Adding or removing new workers should be dynamic. If you want to ensure that the new workers will get tasks in Kakfa you will have to set the partition number to the maximum possible workers. This is not elegant enough.

So the bottom line - use RabbitMQ or other queues instead.

Having said all of that - Samza (by linkedin) is using kafka as some sort of streaming based task queue: Samza

Edit: scale considerations: I forgot to mention that Kakfa is a big data/big scale tool. If your job rate is huge then Kafka might be good option for you despite the things I wrote earlier, since dealing with huge scale is very challenging and Kafka is very good in doing that. If we are talking about smaller scales (say, up to few dosens/hundreds of jobs per second) then again Kafka is a poor choice compared to RabbitMQ.



回答2:

I would say that this depends on the scale. How many tasks do you anticipate in a unit of time?

What you describe as your end goal is basically how Kafka works by default. When you produce messages, default (most widely used) option is to use random partitioner, which chooses partitions in the round robin fashion, keeping partitions evenly used (so it's possible to avoid specifying a partition).
The main purpose of partitions is to parallelize processing of messages, so you should use it in such a manner.
Other commonly used "thing" that partitions are used for is assuring that certain messages get consumed in the same order as they are produced (then you specify partitioning key in such a way that all such messages end up in the same partition. E.g. using userId as key would assure all users are processed in such a way).



回答3:

There is a lot of discussion in this topic revolving around order of execution of tasks in a work or task queue. I would put forth the notion that order of execution should not be a feature of a work queue.

A work queue is a means of controlling resource usage by applying a controllable number of worker threads towards completion of distinct tasks. Enforcing a processing order on tasks in a queue means you are also enforcing a completion order on tasks in the queue which effectively means that tasks in the queue would always be processed sequentially with the next task being processed only after the END of the preceding task. This effectively means you have a single threaded task queue.

If order of execution is important in some of your tasks, then those tasks should add the next task in the sequence to the work queue upon its completion. Either that or you support a Sequential Job type which when processed actually processes a list of jobs sequentially on one worker.

In no way should the work queue actually order any of its work - the next available processor should always take the next task with no regards to what has occurred prior to or after the task completes.

I was also looking at kafka as a basis for a work queue, but the more I research it, the less it looks like the desired platform.

I see it mainly being used as a means of synchronizing disparate resources and not so much as a means of executing disparate job requests.

Another area that I think is important in a work queue is the support of a prioritization of tasks. For example, if I have 20 tasks in the queue, and a new task arrives with a higher priority, I want that task to jump to the start of the line to be picked up by the next available worker. Kafka would not allow this.



回答4:

There are two main obstacles in trying to use Kafka as a message queue:

  1. as described in Ofer's answer, you can only consume a single partition from a single consumer, and order of processing is guaranteed only within a partition. So if you can't distribute the tasks fairly across partitions, this might be a problem

  2. by default, you can only acknowledge processing of all messages up to a given point (offset). Unlike in traditional message queues, you can't do selective acknowledgment and in case of failure, selective retries. This can be address by using kmq, which adds individual acks capability with the help of an additional topic (disclaimer: I'm the author of kmq).

RabbitMQ is an alternative of course, but it also gives different (lower) performance and replication guarantees. In short, RabbitMQ docs state that the broker is not partition tolerant. See also our comparison of message queues with data replication, mqperf.