I am loading data from various sources (csv, xls, json etc...) into Pandas dataframes and I would like to generate statements to create and fill a SQL database with this data. Does anyone know of a way to do this?
I know pandas has a to_sql
function, but that only works on a database connection, it can not generate a string.
Example
What I would like is to take a dataframe like so:
import pandas as pd
import numpy as np
dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
And a function that would generate this (this example is PostgreSQL but any would be fine):
CREATE TABLE data
(
index timestamp with time zone,
"A" double precision,
"B" double precision,
"C" double precision,
"D" double precision
)
GENERATE SQL CREATE STATEMENT FROM DATAFRAME
GENERATE SQL CREATE STATEMENT FROM DATAFRAME
Check the SQL
CREATE TABLE
Statement StringGENERATE SQL INSERT STATEMENT FROM DATAFRAME
Check the SQL
INSERT INTO
Statement StringIf you only want the 'CREATE TABLE' sql code (and not the insert of the data), you can use the
get_schema
function of the pandas.io.sql module:Some notes:
reset_index
because it otherwise didn't include the indexIf you want to write the file by yourself, you may also retrieve columns names and dtypes and build a dictionary to convert pandas data types to sql data types.
As an example:
You can do the same way to populate your table with INSERT INTO.