Django and parallel processing:

2019-04-14 18:07发布

Versions:

  • Python 3.5.1
  • Django 1.10
  • mysqlclient 1.3.10
  • mysql 5.7.18-0ubuntu0.16.04.1 (Ubuntu)
  • Linux Mint 18.1

I have a large Django project where there's a setup script that adds a bunch of content to the database from some csv files. Once in a while, I need to reset everything, and re-add everything from these files. The data furthermore requires some post-processing once added. This however takes a while because the files are long and there's some unavoidable double loops in the code as well as many database queries.

In many cases, the tasks are independent, and thus they should be possible to run in parallel. I looked around for parallel processing libraries and decided to use the very simple multiprocessing.

Thus, the setup is quite simple. We define some function to run in parallel, and then call Pool. Simplified code:

def some_func(input):
    #code inserting data into Django here
    pass

with Pool(4) as p:
    p.map(some_func, [1, 2, 3, 4])

However, running the code results in database connection errors like these reported here, here, here:

_mysql_exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')

It seems like the different threads/cores are trying to share one connection, or maybe the connection is not passed on to the workers.

How do I get parallel processing to work with Django database actions?

1条回答
贼婆χ
2楼-- · 2019-04-14 18:19

After googling around, I was able to find an old (2009) related question on the Django Google groups:

Hi, I was recently debugging similar issue and came to a conclusion (which may be wrong of course :) that multiprocessing and Django DB connections don't play well together. I ended up closing Django DB connection first thing in the new process. It'll recreate a new connection when it needs one, but that one will have no references to the connection used by the parent.

So, my Process.start() calls a function which starts with:

from django.db import connection

connection.close()

This solved my problem.

Thus, to solve the issue, change the function to be something like this:

def some_func(input):
    #kill old database connection
    from django.db import connection
    connection.close()

    #code inserting data into Django here
    pass

Then it worked fine.

查看更多
登录 后发表回答