I have a fairly big pandas dataframe - 50
or so headers and a few hundred thousand rows of data - and I'm looking to transfer this data to a database using the ceODBC
module. Previously I was using pyodbc
and using a simple execute statement in a for loop but this was taking ridiculously long (1000 records per 10 minutes)...
I'm now trying a new module and am trying to introduce executemany()
although I'm not quite sure what's meant by sequence of parameters in:
cursor.executemany("""insert into table.name(a, b, c, d, e, f)
values(?, ?, ?, ?, ?), sequence_of_parameters)
should it look like a constant list working through each header like
['asdas', '1', '2014-12-01', 'true', 'asdasd', 'asdas', '2',
'2014-12-02', 'true', 'asfasd', 'asdfs', '3', '2014-12-03', 'false', 'asdasd']
- where this is an example of three rows
or what is the format that's needed?
as another related question, how then can I go about converting a regular pandas dataframe to this format?
Thanks!
I managed to figure this out in the end. So if you have a Pandas Dataframe which you want to write to a database using
ceODBC
which is the module I used, the code is:(with
all_data
as the dataframe) map dataframe values to string and store each row as a tuple in a list of tuplesfor the list of tuples, change all null value signifiers - which have been captured as strings in conversion above - into a null type which can be passed to the end database. This was an issue for me, might not be for you.
create a connection to the database
define a function to turn the list of tuples into a
new_list
which is a further indexing on the list of tuples, into chunks of 1000. This was necessary for me to pass the data to the database whose SQL Query could not exceed 1MB.define your query.
Run through the the
new_list
containing the list of tuples in groups of 1000 and performexecutemany
. Follow this by committing and closing the connection and that's it :)Might be a little late to answer this question, but maybe it can still help someone.
executemany()
is not implemented by many ODBC. One of the ones that does have it isMySQL
. When they refer to sequence of parameters they mean:and for a query statement it would look something like:
Which looks like you got there. A couple things though I want to point out in case it helps: pandas has a to_sql function that inserts into a db if you provide it the connector object, and chunks the data as well.
To rapidly create a sequence of parameters from a pandas dataframe I found the following two methods helpful:
You can try this:
Hope it helps.