I need to upload a lot of data to a MySQL db. For most models I use django's ORM, but one of my models will have billions (!) of instances and I would like to optimize its insert operation.
I can't seem to find a way to make executemany() work, and after googling it seems there are almost no examples out there.
I'm looking for the correct sql syntax + correct command syntax + correct values data structure to support an executemany command for the following sql statement:
INSERT INTO `some_table` (`int_column1`, `float_column2`, `string_column3`, `datetime_column4`) VALUES (%d, %f, %s, %s)
Yes, I'm explicitly stating the id (int_column1) for efficiency.
A short example code would be great
If you need to modify the data, bulk load with
load data
into a temporary table as is. Then apply modifications with aninsert into select
command. IME, this is by far the fastest way to get a lot of data into a table.are you serisouly suggesting loading billions of rows (sorry instances) of data via some ORM data access layer - how long do you have ?
bulk load if possible - http://dev.mysql.com/doc/refman/5.1/en/load-data.html
I'm not sure how to use the executemany() command, but you can use a single SQL INSERT statement to insert multiple records
Here's a solution that actually uses executemany() !
Basically the idea in the example here will work.
But note that in Django, you need to use the %s placeholder rather than the question mark.
Also, you will want to manage your transactions. I'll not get into that here as there is plenty of documentation available.