I want to upload images to S3 server, but before uploading I want to generate thumbnails of 3 different sizes, and I want it to be done out of request/response cycle hence I am using celery. I have read the docs, here is what I have understood. Please correct me if I am wrong.
- Celery helps you manage your task queues outside the request response cycle.
- Then there is something called carrot/kombu - its a django middleware that packages tasks that get created via celery.
- Then the third layer PyAMQP that facilitates the communication of carrot to a broker. eg. RabbitMQ, AmazonSQS, ironMQ etc.
- Broker sits on a different server and does stuff for you.
Now my understanding is - if multiple users upload image at the same time, celery will queue the resizing, and the resizing will actually happen at the ironMQ server, since it offers a cool addon on heroku.
Now the doubts:
But what after the image is resized, will ironMQ push it to the S3 server, or will it notify once the process is completed.. i am not clear about it.
What is the difference between celery and kombu/carrot, could you explain vividly.
IronMQ does not process your tasks for you; it simply serves as the backend for Celery to keep track of what jobs need to be performed.
So, here's what happens. Assume you have two servers, your web server and your Celery server. Your web server is responsible for handling requests, your Celery server creates the thumbnails and uploads them to S3. Here's what a typical request looks like:
- Your user uploads the image to your web server.
- You store that image somewhere--I'd recommend putting it on S3 right then, personally, but you could also store it in, for example IronCache, base64-encoded. The point is to put it somewhere your Celery server can access it.
- You queue up a job on Celery, passing the location of the image to your Celery server.
- Your Celery server downloads the image, generates your thumbnails, and uploads them to S3. It then stores the S3 URLs in the job results.
- Your web server waits until the job finishes, then has access to the results. Alternatively, you could have your Celery server store the results in the database itself. The point is that the Celery server does the heavy lifting (generating the thumbnails) and does not hold up the request loop while it does.
I wrote an example for using IronMQ on Heroku. You can see it here: http://iron-celery-demo.herokuapp.com. You can see the source for the example on Github and read the tutorial, which explains pretty thoroughly and step-by-step how to deploy Celery on Heroku.
To clear up the AMQP stuff:
- IronMQ is a cloud-based message queue service developed by Iron.io.
- AMQP is an open messaging specification
- RabbitMQ is the most popular implementation (that I know of) of the AMQP specification.
- PyAMQP is a Python library that lets Python clients communicate with any implementation of AMQP, including RabbitMQ
One of the biggest differences between IronMQ and RabbitMQ/AMQP is that IronMQ is hosted and managed, so you don't have to host the server yourself and worry about uptime. The spec offers a bunch more in terms of differentiation, and there are underlying differences, but Celery abstracts most of those away. Because you're using Celery, the only difference you're liable to notice is that IronMQ is hosted, so you don't have to stand up and manage your own server.
Full disclosure: I am employed by Iron.io, the company behind IronMQ.
"One of the biggest differences between IronMQ and RabbitMQ/AMQP is that IronMQ is hosted and managed, so you don't have to host the server yourself and worry about uptime."
Currently there are at least two hosted managed RabbitMQ-as-a-service options: Bigwig and CloudAMQP. Celery should work well with both.