I use Postgresql 9.4 for a model database. My table looks somewhat like this:
CREATE TABLE table1 (
sid INTEGER PRIMARY KEY NOT NULL DEFAULT nextval('table1_sid_seq'::regclass),
col1 INT,
col2 INT,
col3 JSONB);
My Python 2.7 workflow often looks like this:
curs.execute("SELECT sid, col1, col2 FROM table1")
data = curs.fetchall()
putback = []
for i in data:
result = do_something(i[1], i[2])
putback.append((sid, result))
del data
curs.execute("UPDATE table1
SET col3 = p.result
FROM unnest(%s) p(sid INT, result JSONB)
WHERE sid = p.sid", (putback,))
This typically works quite well and efficiently. However, for large queries Postgresql memory use will sometimes go through the roof (>50GB) during the UPDATE
command and I believe it is being killed by OS X, because I get the WARNING: terminating connection because of crash of another server process
. My Macbook Pro has 16GB of RAM and the query in question has 11M lines with each about 100 charactes of data to write back.
My postgresql.conf
:
default_statistics_target = 50
maintenance_work_mem = 512MB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 4GB
work_mem = 256MB
wal_buffers = 16MB
checkpoint_segments = 128
shared_buffers = 1024MB
max_connections = 80
So I wonder
- Why is my query consuming sometimes excessive amounts of RAM?
- How can I control memory use and still guarantee good performance?
- Is there a good guideline or tool for tuning Postgresql?
Update:
I am pretty sure that @wildplasser pinpointed my problem. In the comments he suggests to dump the data into the database first, and unpack it from there. Unfortunately I could not figure out how to implement his proposal. If anyone has an idea how to do that, their answer will be gladly accepted.