psycopg2: insert multiple rows with one query

2018-12-31 16:25发布

I need to insert multiple rows with one query (number of rows is not constant), so I need to execute query like this one:

INSERT INTO t (a, b) VALUES (1, 2), (3, 4), (5, 6);

The only way I know is

args = [(1,2), (3,4), (5,6)]
args_str = ','.join(cursor.mogrify("%s", (x, )) for x in args)
cursor.execute("INSERT INTO t (a, b) VALUES "+args_str)

but I want some simpler way.

13条回答
弹指情弦暗扣
2楼-- · 2018-12-31 17:20

New execute_values method in Psycopg 2.7:

data = [(1,'x'), (2,'y')]
insert_query = 'insert into t (a, b) values %s'
psycopg2.extras.execute_values (
    cursor, insert_query, data, template=None, page_size=100
)

The pythonic way of doing it in Psycopg 2.6:

data = [(1,'x'), (2,'y')]
records_list_template = ','.join(['%s'] * len(data))
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
cursor.execute(insert_query, data)

Explanation: If the data to be inserted is given as a list of tuples like in

data = [(1,'x'), (2,'y')]

then it is already in the exact required format as

  1. the values syntax of the insert clause expects a list of records as in

    insert into t (a, b) values (1, 'x'),(2, 'y')

  2. Psycopg adapts a Python tuple to a Postgresql record.

The only necessary work is to provide a records list template to be filled by psycopg

# We use the data list to be sure of the template length
records_list_template = ','.join(['%s'] * len(data))

and place it in the insert query

insert_query = 'insert into t (a, b) values {}'.format(records_list_template)

Printing the insert_query outputs

insert into t (a, b) values %s,%s

Now to the usual Psycopg arguments substitution

cursor.execute(insert_query, data)

Or just testing what will be sent to the server

print (cursor.mogrify(insert_query, data).decode('utf8'))

Output:

insert into t (a, b) values (1, 'x'),(2, 'y')
查看更多
泪湿衣
3楼-- · 2018-12-31 17:21

A snippet from Psycopg2's tutorial page at Postgresql.org (see bottom):

A last item I would like to show you is how to insert multiple rows using a dictionary. If you had the following:

namedict = ({"first_name":"Joshua", "last_name":"Drake"},
            {"first_name":"Steven", "last_name":"Foo"},
            {"first_name":"David", "last_name":"Bar"})

You could easily insert all three rows within the dictionary by using:

cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)

It doesn't save much code, but it definitively looks better.

查看更多
听够珍惜
4楼-- · 2018-12-31 17:22

cursor.copy_from is the fastest solution I've found for bulk inserts by far. Here's a gist I made containing a class named IteratorFile which allows an iterator yielding strings to be read like a file. We can convert each input record to a string using a generator expression. So the solution would be

args = [(1,2), (3,4), (5,6)]
f = IteratorFile(("{}\t{}".format(x[0], x[1]) for x in args))
cursor.copy_from(f, 'table_name', columns=('a', 'b'))

For this trivial size of args it won't make much of a speed difference, but I see big speedups when dealing with thousands+ of rows. It will also be more memory efficient than building a giant query string. An iterator would only ever hold one input record in memory at a time, where at some point you'll run out of memory in your Python process or in Postgres by building the query string.

查看更多
爱死公子算了
5楼-- · 2018-12-31 17:25

I've been using ant32's answer above for several years. However I've found that is thorws an error in python 3 because mogrify returns a byte string.

Converting explicitly to bytse strings is a simple solution for making code python 3 compatible.

args_str = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup) 
cur.execute(b"INSERT INTO table VALUES " + args_str)
查看更多
浮光初槿花落
6楼-- · 2018-12-31 17:26

If you're using SQLAlchemy, you don't need to mess with hand-crafting the string because SQLAlchemy supports generating a multi-row VALUES clause for a single INSERT statement:

rows = []
for i, name in enumerate(rawdata):
    row = {
        'id': i,
        'name': name,
        'valid': True,
    }
    rows.append(row)
if len(rows) > 0:  # INSERT fails if no rows
    insert_query = SQLAlchemyModelName.__table__.insert().values(rows)
    session.execute(insert_query)
查看更多
步步皆殇っ
7楼-- · 2018-12-31 17:26

If you want to insert multiple rows within one insert statemens (assuming you are not using ORM) the easiest way so far for me would be to use list of dictionaries. Here is an example:

 t = [{'id':1, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 6},
      {'id':2, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 7},
      {'id':3, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 8}]

conn.execute("insert into campaign_dates
             (id, start_date, end_date, campaignid) 
              values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);",
             t)

As you can see only one query will be executed:

INFO sqlalchemy.engine.base.Engine insert into campaign_dates (id, start_date, end_date, campaignid) values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);
INFO sqlalchemy.engine.base.Engine [{'campaignid': 6, 'id': 1, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 7, 'id': 2, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 8, 'id': 3, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}]
INFO sqlalchemy.engine.base.Engine COMMIT
查看更多
登录 后发表回答