Versions:
- Python 3.5.1
- Django 1.10
- mysqlclient 1.3.10
- mysql 5.7.18-0ubuntu0.16.04.1 (Ubuntu)
- Linux Mint 18.1
I have a large Django project where there's a setup script that adds a bunch of content to the database from some csv files. Once in a while, I need to reset everything, and re-add everything from these files. The data furthermore requires some post-processing once added. This however takes a while because the files are long and there's some unavoidable double loops in the code as well as many database queries.
In many cases, the tasks are independent, and thus they should be possible to run in parallel. I looked around for parallel processing libraries and decided to use the very simple multiprocessing.
Thus, the setup is quite simple. We define some function to run in parallel, and then call Pool
. Simplified code:
def some_func(input):
#code inserting data into Django here
pass
with Pool(4) as p:
p.map(some_func, [1, 2, 3, 4])
However, running the code results in database connection errors like these reported here, here, here:
_mysql_exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')
It seems like the different threads/cores are trying to share one connection, or maybe the connection is not passed on to the workers.
How do I get parallel processing to work with Django database actions?
After googling around, I was able to find an old (2009) related question on the Django Google groups:
Thus, to solve the issue, change the function to be something like this:
Then it worked fine.