I need to implement a fair queuing system such that messages are processed in a round robin fashion, based on the value of some message header, for all values of that header on messages currently queued.
Messages in the system are naturally grouped by some property, of which there are many thousands of possible values and the set of values for messages currently queued changes over time. An analogy would be messages having a header which is the milliseconds part of the time, at the time of message creation. Thus, the header will have a value between 0 and 999, and there will be some distribution of the value across all messages currently queued.
I need to be able to consume messages in an order such that no specific value is prioritised over any other. If the header values of queued messages are distributed thus
value | count
------|-------
A | 3
B | 3
C | 2
Then the consumption order would be A,B,C,A,B,C,A,B
.
If messages with another value are added to the queue they should be automatically added to the round robin sequence.
This implies some knowledge of the currently queued messages, but doesn't require that knowledge to be held by the consumer; the broker might have mechanisms to order delivery in some way.
It's acceptable for there to be some threshold beyond which fair queuing starts. Which is to say that, if the threshold were 10, then it's acceptable to sequentially process 10 messages with the same value, but the 11th message processed should be of the next value in sequence. Next might be the same value, if the only queued messages have that value.
The number of possible values probably precludes simply creating a queue for each, and iterating the queues, though this has not yet been tested.
We're using HornetQ, but if there's alternatives that provide these semantics then I'd love to know.
The messages are jobs and the header values are user ids. What's being sought is that, within some limits, no jobs from any given user will unduly delay jobs from any other user; A user producing 1 million jobs doesn't cause later jobs from other users to wait for that million jobs to be processed.
Consumers on queues in HornetQ are evaluated in creation order, so adding a selective consumer to a queue won't stop any catch-all consumer from receiving messages matching the filter.
JMS Groups don't seem to help, as that ties a given group (user?) to a given consumer.
A potential solution is creating selective consumers on a topic based on demand (e.g.: 10 sequential messages from the same user), with something managing the lifecycle of all selective consumers to ensure the catch-all doesn't process the same message. While possible this does seem to have some onerous synchronisation requirements.
You are wanting the JMS broker to implement a message delivery algorithm (fair queuing) that is not, as far as I know, part of the JMS specification. It may be possible to find a way to make the broker do this, but I doubt it, and any solution you come up with is likely to be broker-specific.
Instead, why not put the desired queuing algorithm in your own application? For example: write a "fair queuing forwarder (FQF)" application that subscribes to all the messages, in whatever order they come from the broker. Have this FQF application consume messages as fast as possible, so that the JMS broker queue is always empty or near empty. The FQF application could then store the messages in a local queue, and republish them one at a time, in whatever order your desired queuing algorithm determines, to a queue or topic that the ultimate message processing application is subscribed to. On this end you would probably want to use transactions or some sort of flow control so that the FQF application only publishes messages at the rate they can be processed by the end system.
To put this in terms of your example, the messages represent jobs to be processed in a certain order based on a user id attribute in the message header. So I'm suggesting that you write a job scheduling algorithm that passes the jobs to the job processor using whatever queuing algorithm you want.
This way, you would have complete control over the message processing order, without having to somehow "trick" the broker into doing what you want. You would be using JMS simply as a message delivery mechanism, so you don't have to write your own message passing protocol.
Not sure I exactly understand this, but it's problematic that your property has thousands of possible values and that it changes over time. That sounds like a typical 'hash-value to the rescue' problem. So how about creating the hash value of that property and then doing a modulo of that value to derive a fair, but previously known value?
Let's assume it is practical to have 100 consumers processing from 100 queues, named Q0 to Q99. Then you could do this in your JMS producers:
And this is the queue name that producers send to. Also the user value is added as property. The same user will go to the same queue (and queue up, if there are too many), and users are almost fairly distributed across consuming applications.
Now you still have the problem of one bad user creating a million jobs. A second part of this solution could be a JMS selector that is empty on startup. Once the first message is consumed, you count then number of jobs for each user and once you reach a threshold, e.g. 10 jobs from the same user, you temporarily disallow this user by adding a selector such as: 'user NOT LIKE user123'. If there is more than once such user, you accumulate the selection with 'AND'. Once the consumer doesn't get any further messages, you set this consumer's select to be empty and start processing the queue again.
I would suggest using message priority for that + setting consumerWindowSize=0 and always have the client to pick a new message from the server.
You would have more latency on the messages but you would always have the messages coming from the server.
Notice that there are races that you will need to consider. what if you are consuming Message C and Message B arrived? you will then consume C for later consume B. But that's not much you could do there since C is under consumption already.
You may also consider Message Grouping but you would be binding a messagegroup to a single consumer from your load balancing.
The first option to consider would be to have a multi-threaded consuming application. Assuming a thread per session/consumer it would be possible to setup either sync or async receive with a selector. Each selector would be keyed to a specific user.
Working with the assumption the JVM werereasonable equitable in terms of dispatch of threads (which I would be happy to assume) AND there weren't any deadlocks in the application code I would assert the requirements would be fullfilled. One thread might get stuck with a users million jobs the rest won't be affected.
If however a single threaded application is desired, then nothing in JMS specification in and itself could help. Their may be vendor extensions of course that could help. However the other option would be a for an application to look at each messages and put it a specific queue for the user id. The final consuming application itself would then 'round-robin' between these queues to get the work. Needs another application but you have a very deterministic system.