Celery is something that I kept hearing about but took me a while to wrap my head around. Why is it important? What is it good for?
With a couple of big Django websites under my belt it has become more obvious where something like Celery fits and why it’s useful.
The first thing you need to know is what problem Celery solves for you. It allows you to do things asynchronously so that you can perform longer running tasks in the background and quickly return a page to website visitors. It also provides a means for running and scheduling tasks that might have before been managed by crontab.
It is good for things that could result in long wait times like sending emails. In the case where the code to send an email could potentially timeout if the mail server is down then putting the send email code into a view function could result in a long frustrating wait for the user.
The biggest issue with cron is that in a multi-server environment it is difficult to manage what is going to run where, and balance your resources. It’s also a bit of a pain to deploy changes to the schedule without pushing configuration files to your servers. You lose control over any load balancing and it becomes difficult to manually kick off processes.
Celery also provides some neat methods for delaying code to run in the future. For example, lets say you want to send an email to users who haven’t logged in to your website in the last 7 days to remind them of your awesome service. To optimize it further you’d like the email to be sent at a time of day when there’s a good chance they’re sitting at their computer (the same time of day as their last login). Using a cron job you’d have a script that queried the database for all users who met the criteria and then send out all those emails at once – and run it every few minutes. The risk there is that an error in the DB query could result in spamming a lot of people with the wrong email.
Using Celery it’s possible to schedule the code to send the email 7 days into the future. The task code will be much simpler – check that the conditions are still valid and send just one email to the one user. It also scales up nicer – since it’s only going to query the DB when there is a likely chance that it actually will have some work to do.
Setting up Celery takes a bit of work. Actually, getting it to work is relatively easy, deploying it to production is a bit more involved since it requires a bunch of other tools you may not be using yet.
There are two celery tasks that need to be demonized – the Celery workers and the Celery beat (for scheduled tasks). Celery tasks are queued up by either the celerybeat program or from scheduling them in your app. The queue that all these are persisted into is some sort of message queue service or database. It can be RabbitMQ, Amazon SQS, Redit, Mongo, or a Django DB. Workers listen to the queue for things to do and pick up work when they can.
Demonizing the workers and celerybeat can be done with Supervisord. It simply makes sure that if the programs crash they get restarted, and that they continually run in the background.
Monitoring what tasks are in the queue, what is running, and the results of past tasks is through an app called Flower.
Ensuring that everything stays up and running can be handled by Monit.
Once the infrastructure is in place having Celery really does simplify a lot of the code you write. Simpler code make me happy.