Insert Data to SQL Server Table using pymssql

2019-05-31 04:46发布

问题:

I am trying to write the data frame into the SQL Server Table. My code:

conn = pymssql.connect(host="Dev02", database="DEVDb")
cur = conn.cursor()
query = "INSERT INTO dbo.SCORE_TABLE VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"
cur.executemany(query, df_sql)
conn.commit()
cur.close()
conn.close()

The dimension of the df_sql is (5860, 20) i.e. the number of columns in the data frame is same as the number of columns in the SQL Server Table. Still I am getting following error:

ValueError: more placeholders in sql than params available

UPDATED BELOW

As per one of the comments, I tried using turbodbc as below:

conn = turbodbc.connect(driver="{SQL Server}", server="Dev02", Database="DEVDb")
conn.use_async_io = True
cur = conn.cursor()
query = "INSERT INTO dbo.STG_CONTACTABILITY_SCORE VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
cur.executemany(query, df_sql.values)
cur.commit()
cur.close()
conn.close()

I am getting following error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I don't get it. What is wrong here. I see df_sql.values and I don't find anything wrong.

The first row of ndarray is as below:

[nan 'DUSTIN HOPKINS' 'SOUTHEAST MISSOURI STATE UNIVERSITY' 13.0
  '5736512217' None None 'Monday' '8:00AM' '9:00AM' 'Summer' None None None
  None '2017-12-22 10:39:30.626331' 'Completed' None '1-11KUFFZ'
  'Central Time Zone']

回答1:

I think you just need to specify each column name and don't forget the table must have the id field to charge the data frame index:

conn = pymssql.connect(host="Dev02", database="DEVDb")
cur = conn.cursor()
query = """INSERT INTO dbo.SCORE_TABLE(index, column1, column2, ..., column20)
            VALUES (?, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, 
            %s, %s, %s, %s, %s, %s)"""
cur.executemany(query, df_sql)
conn.commit()
cur.close()
conn.close()

Ok I have been using pandas and I exported the last data frame to csv like:

df.to_csv('new_file_name.csv', sep=',', encoding='utf-8')

Then I just used pyobdc and BULK INSERT Transact-SQL like:

import pyodbc

conn = pyodbc.connect(DRIVER='{SQL Server}', Server='server_name', Database='Database_name', trusted_connection='yes')
cur = conn.cursor()

cur.execute("""BULK INSERT table_name
               FROM 'C:\\Users\\folders path\\new_file_name.csv'
               WITH
               (
                   CODEPAGE = 'ACP',
                   FIRSTROW = 2,
                   FIELDTERMINATOR = ',',
                   ROWTERMINATOR = '\n'
               )""")
conn.commit()

cur.close()
conn.close()

It was a second to charge 15314 rows into SQL Server. I hope this gives you an idea.



回答2:

If i understand correctly you want to use DataFrame.to_sql() method:

df_sql.to_sql('dbo.SCORE_TABLE', conn, index=False, if_exists='append')


回答3:

Possibly executemany treats each row in the ndarray from your df.values call as one item since there are no comma separators between values. Hence, the placeholders outnumber actual binded values and you receive the mismatch error.

Consider converting array to a tuple of tuples (or lists of lists/tuple of lists/list of tuples) and then pass that object into executemany:

query = "INTO dbo.SCORE_TABLE VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"    
sql_data = tuple(map(tuple, df.values))

cur.executemany(query, sql_data)
cur.commit()