Globally accessible object across all Celery worke

2019-06-16 18:08发布

I have pretty standard Django+Rabbitmq+Celery setup with 1 Celery task and 5 workers.

Task uploads the same (I simplify a bit) big file (~100MB) asynchronously to a number of remote PCs.

All is working fine at the expense of using lots of memory, since every task/worker load that big file into memory separatelly.

What I would like to do is to have some kind of cache, accessible to all tasks, i.e. load the file only once. Django caching based on locmem would be perfect, but like documentation says: "each process will have its own private cache instance" and I need this cache accessible to all workers.

Tried to play with Celery signals like described in #2129820, but that's not what I need.

So the question is: is there a way I can define something global in Celery (like a class based on dict, where I could load the file or smth). Or is there a Django trick I could use in this situation ?

Thanks.

3条回答
别忘想泡老子
2楼-- · 2019-06-16 18:57

Why not simply stream the upload(s) from disk instead of loading the whole file in memory ?

查看更多
可以哭但决不认输i
3楼-- · 2019-06-16 19:00

It seems to me that what you need is memcached backed for django. That way each task in Celery will have access to it.

查看更多
狗以群分
4楼-- · 2019-06-16 19:08

Maybe you can use threads instead of processes for this particular task. Since threads all share the same memory, you only need one copy of the data in memory, but you still get parallel execution. ( this means not using Celery for this task )

查看更多
登录 后发表回答