Python in Windows: large number of inserts using p

I am trying to populate a MS SQL 2005 database using python on windows. I am inserting millions of rows, and by 7 million I am using almost a gigabyte of memory. The test below eats up 4 megs of RAM for each 100k rows inserted:

import pyodbc
connection=pyodbc.connect('DRIVER={SQL Server};SERVER=x;DATABASE=x;UID=x;PWD=x')
cursor=connection.cursor()
connection.autocommit=True
while 1:
    cursor.execute("insert into x (a,b,c,d, e,f) VALUES (?,?,?,?,?,?)",1,2,3,4,5,6)
mdbconn.close()

Hack solution: I ended up spawning a new process using the multiprocessing module to return memory. Still confused about why inserting rows in this way consumes so much memory. Any ideas?

标签： python memory-leaks pyodbc

5条回答

Melony?

2楼-- · 2019-05-17 16:55

I had the same issue, and it looks like a pyodbc issue with parameterized inserts: http://code.google.com/p/pyodbc/issues/detail?id=145

Temporarily switching to a static insert with the VALUES clause populated eliminates the leak, until I try a build from the current source.

0人赞添加讨论(0) 举报

叼着烟拽天下

3楼-- · 2019-05-17 17:03

Try creating a separate cursor for each insert. Reuse the cursor variable each time through the loop to implicitly dereference the previous cursor. Add a connection.commit after each insert.

You may only need something as simple as a time.sleep(0) at the bottom of each loop to allow the garbage collector to run.

0人赞添加讨论(0) 举报

\"骚年 ilove

4楼-- · 2019-05-17 17:03

You could also try forcing a garbage collection every once in a while with gc.collect() after importing the gc module.

Another option might be to use cursor.executemany() and see if that clears up the problem. The nasty thing about executemany(), though, is that it takes a sequence rather than an iterator (so you can't pass it a generator). I'd try the garbage collector first.

EDIT: I just tested the code you posted, and I am not seeing the same issue. Are you using an old version of pyodbc?

0人赞添加讨论(0) 举报

迷人小祖宗

5楼-- · 2019-05-17 17:06

Maybe close & re-open the connection every million rows or so?

Sure it doesn't solve anything, but if you only have to do this once you could get on with life!

0人赞添加讨论(0) 举报

疯言疯语

6楼-- · 2019-05-17 17:17

Even I had faced the same problem.

I had to read more than 50 XML files each about 300 MB and load them into SQL Server 2005.

I tried the following :

Using the same cursor by dereferencing.

Closing /opening the connection

Setting the connection to None.

Finally ended up bootstrapping each XML file load using Process module.

Now I have replaced the process using IronPython - System.Data.SqlClient.

This give a better performance and also better interface.

0人赞添加讨论(0) 举报

Python in Windows: large number of inserts using p

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间