Celery design help: how to prevent concurrently ex

I'm fairly new to Celery/AMQP and am trying to come up with a task/queue/worker design to meet the following requirements.

I have multiple types of "per-user" tasks: e.g., TaskA, TaskB, TaskC. Each of these "per-user" tasks read/write data for one particular user in the system. So at any given time, I might need to create tasks User1_TaskA, User1_TaskB, User1_TaskC, User2_TaskA, User2_TaskB, etc. I need to ensure that, for each user, no two tasks of any task type execute concurrently. I want a system in which no worker can execute User1_TaskA at the same time as any other worker is executing User1_TaskB or User1_TaskC, but while User1_TaskA is executing, other workers shouldn't be blocked from concurrently executing User2_TaskA, User3_TaskA, etc.

I realize this could be implemented using some sort of external locking mechanism (e.g., in the DB), but I'm hoping there's a more elegant task/queue/worker design that would work.

I suppose one possible solution is to implement queues as user buckets such that, when the workers are launched there's config that specifies how many buckets to create, and each "bucket worker" is bound to exactly one bucket. Then an "intermediate worker" would pull off tasks from the main task queue and assign them into the bucketed queues via, say, a hash/mod scheme. So UserA's tasks would always end up in the same queue, and multiple tasks for UserA would back up behind each other. I don't love this approach, as it would require the number of buckets to be defined ahead of time, and would seem to prevent (easily) adding workers dynamically. Seems to me there's got to be a better way -- suggestions would be greatly appreciated.

What's so bad in using an external locking mechanism? It's simple, straightforward, and efficient enough. You can find an example of distributed task locking in Celery here. Extend it by creating a lock per user, and you're done!