Python based asynchronous workflow modules : What

2019-03-14 08:15发布

问题:

I am using django as a web framework. I need a workflow engine that can do synchronous as well as asynchronous(batch tasks) chain of tasks. I found celery and luigi as batch processing workflow. My first question is what is the difference between these two modules.

Luigi allows us to rerun failed chain of task and only failed sub-tasks get re-executed. What about celery: if we rerun the chain (after fixing failed sub-task code), will it rerun the already succeed sub-tasks?

Suppose I have two sub-tasks. The first one creates some files and the second one reads those files. When I put these into chain in celery, the whole chain fails due to buggy code in second task. What happens when I rerun the chain after fixing the code in second task? Will the first task try to recreate those files?

回答1:

Update: As Erik pointed, Celery is better choice for this case.

Celery:

What is Celery?

Celery is a simple, flexible and reliable distributed system to process vast amounts of messages, while providing operations with the tools required to maintain such a system.

Why use Celery?

  • It is simple to use & has lots of features.
  • django-celery: provides good integration with Django.
  • flower: Real-time monitor and web admin for Celery distributed task queue.
  • Active & large community(based on Stackoverflow activity, Pyvideos, tutorials, blog posts).

Luigi

What is Luigi?

Luigi(Spotify's recently open sourced Python framework) is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.

Why use Luigi?

  • Builtin support for Hadoop.
  • Generic enough to be used for everything from simple task execution and monitoring on a local work station, to launching huge chains of processing tasks that can run in synchronization between many machines over the span of several days.
  • Lugi's visualiser: Gives a nice visual overview of dependency graph of workflow.

Conclusion: If you need a tool just to simply schedule tasks & run them you can use Celery. If you are dealing with big data & huge processing you can go for Luigi.



回答2:

(I'm the author of Luigi)

Luigi is not meant for synchronous low-latency framework. It's meant for large batch processes that run for hours or days. So I think for your use case, Celery might actually be slightly better