Designing fair message consumption in a distribute

We have a distributed application that does work on behalf of multiple clients. Right now as work arrives messages are put on a single queue. The messages are pulled of in FIFO order and this works fine most of the time.

There are occasions where we will get thousands of messages for a single client which then occupy the first thousand+ slots in the queue. This means that no (or very little) work can be done for any other client until those messages are processed - this starves the smaller clients.

My thinking was that before we reference a list of clients and each time, before taking a message from the queue we peek at the message to see which client submitted the message. If it is for the same client we just processed, we push it back until we either find a message for a different client, or we iterate all the clients and find no other work.

Is this a reasonable solution? Is this a problem that has already been solved and if so how?

I have looked at this question (Is "fair queuing" possible with JMS) but I do not think groups is the right answer as I do not want all the messages going to a single server. I just want to give messages for other clients a fair chance to be processed without waiting in line behind a thousand other messages for a single client.

Currently we are using ActiveMQ as our message queuing system.

Your proposal is reasonable, though for a distributed system that can process multiple messages at once on multiple machines, I'd suggest an alternative: With a thread pool that processes multiple messages at once, you can use a solution that includes an "overflow queue".

When reading from the main queue, check if more than X messages are being processed for a customer already (your fairness threshold). If so, put the message into the overflow queue instead of processing it.
Reserve a small number of threads (Y) to process that queue, and the rest for the main queue.
Y should be configurable in order to keep processing when your overflow is very large relative to your main queue (or when your main queue is empty). Or, rather than Y threads doing overflow processing, just make it a percentage - say 90% of messages come from the main queue and 10% from overflow.

This results in more fair processing - most processing uses the main queue (which now only allows "fair" messages through), and your heavy users will still get some throughput beyond X (with the overflow processing).

It's not the "best" solution, but it's reasonably good and still relatively simple to implement.

Make sure to have some metrics on your queue sizes to see if there's a big backlog forming (or an empty queue), and ideally automatically adjust parameters so you can give higher priority to overflow in times of low traffic to the main queue.