Is it possible with RabbitMQ and Python to do content-based routing?
The AMQP standard and RabbitMQ claims to support content-based routing, but are there any libraries for Python which support specifying content-based bindings etc.?
The library I am currently using (py-amqplib http://barryp.org/software/py-amqplib/) seems to only support topic-based routing with simple pattern-matching (#, *).
The answer is "yes", but there's more to it... :)
Let's first agree on what content-based routing means. There are two possible meanings. Some people say that it is based on the header portion of a message. Others say it's based on the data portion of a message.
If we take the first definition, these are more or less the assumptions we make:
The data comes into existence somewhere, and it gets sent to the AMQP broker by some piece of software. We assume that this piece of software knows enough about the data to put key-value (KV) pairs in the header of the message that describe the content. Ideally, the sender is also the producer of the data, so it has as much information as we could ever want. Let's say the data is an image. We could then have the sender put KV pairs in the message header like this:
width=1024
height=768
mode=bw
photographer=John Doe
Now we can implement content-based routing by creating appropriate queues. Let's say we have a separate operation to perform on black-and-white images and a separate one on colour images. We can create two queues, one that receives messages with mode=bw
and another with mode=colour
. Then we have separate clients listening on those queues. The broker performs the routing, and there is nothing in our client that needs to be aware of the routing.
If we take the second definition, we go from different assumptions. We assume that the data comes into existence somewhere, and it gets sent to AMQP broker by some piece of software. But we assume that it's not sensible to demand that that software should populate the header with KV pairs. Instead, we want to make a routing decision based on the data itself.
There are two options for this in AMQP: you can decide to implement a new exchange for your particular data format, or you can delegate the routing to a client.
In RabbitMQ, there are direct (1-to-1), fanout (1-to-N), headers (header-filtered 1-to-N) and topic (topic-filtered 1-to-N) exchanges, but you can implement your own according to the AMQP standard. This would require reading a lot of RabbitMQ documentation and implementing the exchange in Erlang.
The other option is to make an AMQP client in Python that listens to a special "content routing queue". Whenever a message arrives at the queue, your router-client picks it up, does whatever is needed to make a routing decision, and sends the message back to the broker to a suitable queue. So to implement the scenario above, your Python program would detect whether an image is in black-and-white or colour, and would (re)send it to a "black-and-white" or a "colour" queue, where some suitable client would take over.
So on your second question, there's really nothing that you do in your client that does any content-based binding. Either your client(s) work as described above, or you create a new exchange type in RabbitMQ itself. Then, in your client setup code, you define the exchange type to be your new type.
Hope this answers your question!
In RabbitMQ, routing is the process by which an exchange decides which queues to place your message on. You publish all messages to an exchange, but you only receive messages from a queue. This means that the exchange is an active part of the process that makes some decisions about message forwarding or copying.
The topic exchange included with RabbitMQ looks at a string on the incoming messages (the routing_key) and matches that with the patterns (the binding_keys) supplied by all queues which declare their desire to receive messages from the exchange.
RabbitMQ source code is on the web so you can have a look at the topic exchange code here:
http://hg.rabbitmq.com/rabbitmq-server/file/9b22dde04c9f/src/rabbit_exchange_type_topic.erl
A lot of the complexity there is to handle a data structure called a trie which allows for very fast lookups. In fact the same data structure is used inside Internet routers.
The headers exchange found here http://hg.rabbitmq.com/rabbitmq-server/file/9b22dde04c9f/src/rabbit_exchange_type_headers.erl
is probably easier to understand. As you can see there is not a lot of code required to make a different type of exchange. If you wanted to examine the content (or maybe just peek at the first few bytes of messages, you should be able to quickly identify XML versus JSON versus something else. And if your JSON objects and XML documents maintain a specific sequence of elements then you should be able to distinguish between different JSON objects (or XML doc types) without parsing the entire message body.