Iterating over a large QuerySet doesn't seem to be a viable option in Django 2.0 anymore. I've tried to speed it up using Django's own Paginator.
def read_from_db_4():
paginator = Paginator(DataSet.objects.filter(status_id=1).order_by('id'), 1000)
l = []
print("{} iterations!".format(paginator.num_pages))
for page in range(1, paginator.num_pages+1):
l = l + list(paginator.page(page).object_list)
print("{}, next page...".format(page))
return l
This little function is reasonably quick but will stop on the very last page.
I can also not get to the length of this page:
len(paginator.page(LASTPAGE).object_list)
this will just hang forever. I can get the length of all other pages previously.
What's the reason for this odd behaviour?
Okay, I've found the problem was the Python MySQL driver.
I was not using the Django recommended mysqlclient. I was using the one created by Oracle/MySQL. There seems to be a bug that causes an iterator to get "stuck" on the last element of the QuerySet in a
for
loop and be trapped in an endless loop in certain circumstances.I've found was of getting the iterator to behave normally and quit looping when it reached the last element in the QuerySet, by, for example, using the
.iterator()
method on it. However, that wasn't an option for the use case I had.Installing
mysqlclient
solved the issue for me. Please note that when you are using a Mac, you will need to also follow the "Note about bug of MySQL Connector/C on macOS" section.You can find the
mysql_config
in the folder/usr/local/mysql/bin
. I was a bit put off initially by the fact that the guide makes you search for the file and it took me quite some time to find it, seeing that a search for "mysql_config" gives you dozens of results as to where to find themy.cnf
.