I have a large table with ca. 10 million rows in PostgreSQL 9.4 database. It looks somewhat like this:
gid | number1 | random | result | ...
1 | 2 | NULL | NULL | ...
2 | 15 | NULL | NULL | ...
... | ... | ... | ... | ...
Now I would like to update the columns random
and result
as a function of number1
. That means that at least random
needs to be produced in a script outside of the database. Since I have limited RAM, I wonder how I could do that efficiently using psycopg2
. I believe I face two problems: How to fetch the data without using too much RAM and how to get it back in there. The simpleton approach would look like this:
curs.execute("""SELECT gid1, number1 FROM my_table;""")
data = curs.fetchall()
result = []
for i in data:
result.append((create_random(i[1]), i[0]))
curs.executemany("""UPDATE my_table
SET random = %s
WHERE gid = %s;""",
results)
curs.execute("""UPDATE my_table
SET result = number1 * random;""")
However, this will certainly quickly deplete all my memory and take forever to UPDATE my_table
.
What would be a smarter strategy? The database is being accessed exclusively and can be locked. The PostgreSQL random function is unfortunately not suitable for my case.