psycopg2: insert multiple rows with one query

2018-12-31 16:25发布

I need to insert multiple rows with one query (number of rows is not constant), so I need to execute query like this one:

INSERT INTO t (a, b) VALUES (1, 2), (3, 4), (5, 6);

The only way I know is

args = [(1,2), (3,4), (5,6)]
args_str = ','.join(cursor.mogrify("%s", (x, )) for x in args)
cursor.execute("INSERT INTO t (a, b) VALUES "+args_str)

but I want some simpler way.

13条回答
深知你不懂我心
2楼-- · 2018-12-31 17:06

Update with psycopg2 2.7:

The classic executemany() is about 60 times slower than @ant32 's implementation (called "folded") as explained in this thread: https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com

This implementation was added to psycopg2 in version 2.7 and is called execute_values():

from psycopg2.extras import execute_values
execute_values(cur,
    "INSERT INTO test (id, v1, v2) VALUES %s",
    [(1, 2, 3), (4, 5, 6), (7, 8, 9)])

Previous Answer:

To insert multiple rows, using the multirow VALUES syntax with execute() is about 10x faster than using psycopg2 executemany(). Indeed, executemany() just runs many individual INSERT statements.

@ant32 's code works perfectly in Python 2. But in Python 3, cursor.mogrify() returns bytes, cursor.execute() takes either bytes or strings, and ','.join() expects str instance.

So in Python 3 you may need to modify @ant32 's code, by adding .decode('utf-8'):

args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x).decode('utf-8') for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)

Or by using bytes (with b'' or b"") only:

args_bytes = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_bytes) 
查看更多
零度萤火
3楼-- · 2018-12-31 17:08

I built a program that inserts multiple lines to a server that was located in another city.

I found out that using this method was about 10 times faster than executemany. In my case tup is a tuple containing about 2000 rows. It took about 10 seconds when using this method:

args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str) 

and 2 minutes when using this method:

cur.executemany("INSERT INTO table VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s)", tup)
查看更多
琉璃瓶的回忆
4楼-- · 2018-12-31 17:09

Finally in SQLalchemy1.2 version, this new implementation is added to use psycopg2.extras.execute_batch() instead of executemany when you initialize your engine with use_batch_mode=True like:

engine = create_engine(
    "postgresql+psycopg2://scott:tiger@host/dbname",
    use_batch_mode=True)

http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109

Then someone would have to use SQLalchmey won't bother to try different combinations of sqla and psycopg2 and direct SQL together..

查看更多
人间绝色
5楼-- · 2018-12-31 17:10

All of these techniques are called 'Extended Inserts" in Postgres terminology, and as of the 24th of November 2016, it's still a ton faster than psychopg2's executemany() and all the other methods listed in this thread (which i tried before coming to this answer).

Here's some code which doesnt use cur.mogrify and is nice and simply to get your head around:

valueSQL = [ '%s', '%s', '%s', ... ] # as many as you have columns.
sqlrows = []
rowsPerInsert = 3 # more means faster, but with diminishing returns..
for row in getSomeData:
        # row == [1, 'a', 'yolo', ... ]
        sqlrows += row
        if ( len(sqlrows)/len(valueSQL) ) % rowsPerInsert == 0:
                # sqlrows == [ 1, 'a', 'yolo', 2, 'b', 'swag', 3, 'c', 'selfie' ]
                insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*rowsPerInsert)
                cur.execute(insertSQL, sqlrows)
                con.commit()
                sqlrows = []
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*len(sqlrows))
cur.execute(insertSQL, sqlrows)
con.commit()

But it should be noted that if you can use copy_from(), you should use copy_from ;)

查看更多
妖精总统
6楼-- · 2018-12-31 17:17

Nicely execute in batches using record template with psycopg2 !

def get_batch(iterable, size=1):
    for i in range(0, len(iterable), size):
        yield iterable[i: i + size]


def insert_rows_batch(table, rows, batch_size=500, target_fields=None):
    """
    A utility method to insert batch of tuples(rows) into a table
    NOTE: Handle data type for fields in rows yourself as per your table 
    columns' type.

    :param table: Name of the target table
    :type table: str

    :param rows: The rows to insert into the table
    :type rows: iterable of tuples

    :param batch_size: The size of batch of rows to insert at a time
    :type batch_size: int

    :param target_fields: The names of the columns to fill in the table
    :type target_fields: iterable of strings
    """
    conn = cur = None
    if target_fields:
        target_fields = ", ".join(target_fields)
        target_fields = "({})".format(target_fields)
    else:
        target_fields = ''

    conn = get_conn() # get connection using psycopg2
    if conn:
        cur = conn.cursor()
    count = 0

    for mini_batch in get_batch(rows, batch_size):
        mini_batch_size = len(mini_batch)
        count += mini_batch_size
        record_template = ','.join(["%s"] * mini_batch_size)
        sql = "INSERT INTO {0} {1} VALUES {2};".format(
            table,
            target_fields,
            record_template)
        cur.execute(sql, mini_batch)
        conn.commit()
        print("Loaded {} rows into {} so far".format(count, table))
    print("Done loading. Loaded a total of {} rows".format(count))
    if cur:cur.close()
    if conn:conn.close()

If you want UPSERT (Insert+Update) as well in postgres with batches: postgres_utilities

查看更多
冷夜・残月
7楼-- · 2018-12-31 17:17

Using aiopg - The snippet below works perfectly fine

    # items = [10, 11, 12, 13]
    # group = 1
    tup = [(gid, pid) for pid in items]
    args_str = ",".join([str(s) for s in tup])
    # insert into group values (1, 10), (1, 11), (1, 12), (1, 13)
    yield from cur.execute("INSERT INTO group VALUES " + args_str)
查看更多
登录 后发表回答