I have a large table with ca. 10 million rows in PostgreSQL 9.4 database. It looks somewhat like this:
gid | number1 | random | result | ...
1 | 2 | NULL | NULL | ...
2 | 15 | NULL | NULL | ...
... | ... | ... | ... | ...
Now I would like to update the columns random
and result
as a function of number1
. That means that at least random
needs to be produced in a script outside of the database. Since I have limited RAM, I wonder how I could do that efficiently using psycopg2
. I believe I face two problems: How to fetch the data without using too much RAM and how to get it back in there. The simpleton approach would look like this:
curs.execute("""SELECT gid1, number1 FROM my_table;""")
data = curs.fetchall()
result = []
for i in data:
result.append((create_random(i[1]), i[0]))
curs.executemany("""UPDATE my_table
SET random = %s
WHERE gid = %s;""",
results)
curs.execute("""UPDATE my_table
SET result = number1 * random;""")
However, this will certainly quickly deplete all my memory and take forever to UPDATE my_table
.
What would be a smarter strategy? The database is being accessed exclusively and can be locked. The PostgreSQL random function is unfortunately not suitable for my case.
unnest
the data to make it all at once:Table t: