I have a node.js program calling a Postgres (Amazon RDS micro instance) function, get_jobs
within a transaction, 18 times a second using the node-postgres
package by brianc.
The node code is just an enhanced version of brianc's basic client pooling example, roughly like...
var pg = require('pg');
var conString = "postgres://username:password@server/database";
function getJobs(cb) {
pg.connect(conString, function(err, client, done) {
if (err) return console.error('error fetching client from pool', err);
client.query("BEGIN;");
client.query('select * from get_jobs()', [], function(err, result) {
client.query("COMMIT;");
done(); //call `done()` to release the client back to the pool
if (err) console.error('error running query', err);
cb(err, result);
});
});
}
function poll() {
getJobs(function(jobs) {
// process the jobs
});
setTimeout(poll, 55);
}
poll(); // start polling
So Postgres is getting:
2016-04-20 12:04:33 UTC:172.31.9.180(38446):XXX@XXX:[5778]:LOG: statement: BEGIN;
2016-04-20 12:04:33 UTC:172.31.9.180(38446):XXX@XXX:[5778]:LOG: execute <unnamed>: select * from get_jobs();
2016-04-20 12:04:33 UTC:172.31.9.180(38446):XXX@XXX:[5778]:LOG: statement: COMMIT;
... repeated every 55ms.
get_jobs
is written with temp tables, something like this
CREATE OR REPLACE FUNCTION get_jobs (
) RETURNS TABLE (
...
) AS
$BODY$
DECLARE
_nowstamp bigint;
BEGIN
-- take the current unix server time in ms
_nowstamp := (select extract(epoch from now()) * 1000)::bigint;
-- 1. get the jobs that are due
CREATE TEMP TABLE jobs ON COMMIT DROP AS
select ...
from really_big_table_1
where job_time < _nowstamp;
-- 2. get other stuff attached to those jobs
CREATE TEMP TABLE jobs_extra ON COMMIT DROP AS
select ...
from really_big_table_2 r
inner join jobs j on r.id = j.some_id
ALTER TABLE jobs_extra ADD PRIMARY KEY (id);
-- 3. return the final result with a join to a third big table
RETURN query (
select je.id, ...
from jobs_extra je
left join really_big_table_3 r on je.id = r.id
group by je.id
);
END
$BODY$ LANGUAGE plpgsql VOLATILE;
I've used the temp table pattern because I know that jobs
will always be a small extract of rows from really_big_table_1
, in hopes that this will scale better than a single query with multiple joins and multiple where conditions. (I used this to great effect with SQL Server and I don't trust any query optimiser now, but please tell me if this is the wrong approach for Postgres!)
The query runs in 8ms on small tables (as measured from node), ample time to complete one job "poll" before the next one starts.
Problem: After about 3 hours of polling at this rate, the Postgres server runs out of memory and crashes.
What I tried already...
If I re-write the function without temp tables, Postgres doesn't run out of memory, but I use the temp table pattern a lot, so this isn't a solution.
If I stop the node program (which kills the 10 connections it uses to run the queries) the memory frees up. Merely making node wait a minute between polling sessions doesn't have the same effect, so there are obviously resources that the Postgres backend associated with the pooled connection is keeping.
If I run a
VACUUM
while polling is going on, it has no effect on memory consumption and the server continues on its way to death.Reducing the polling frequency only changes the amount of time before the server dies.
Adding
DISCARD ALL;
after eachCOMMIT;
has no effect.Explicitly calling
DROP TABLE jobs; DROP TABLE jobs_extra;
afterRETURN query ()
instead ofON COMMIT DROP
s on theCREATE TABLE
s. Server still crashes.Per CFrei's suggestion, added
pg.defaults.poolSize = 0
to the node code in an attempt to disable pooling. The server still crashed, but took much longer and swap went much higher (second spike) than all the previous tests which looked like the first spike below. I found out later thatpg.defaults.poolSize = 0
may not disable pooling as expected.
On the basis of this: "Temporary tables cannot be accessed by autovacuum. Therefore, appropriate vacuum and analyze operations should be performed via session SQL commands.", I tried to run a
VACUUM
from the node server (as some attempt to makeVACUUM
an "in session" command). I couldn't actually get this test working. I have many objects in my database andVACUUM
, operating on all objects, was taking too long to execute each job iteration. RestrictingVACUUM
just to the temp tables was impossible - (a) you can't runVACUUM
in a transaction and (b) outside the transaction the temp tables don't exist. :P EDIT: Later on the Postgres IRC forum, a helpful chap explained that VACUUM isn't relevant for temp tables themselves, but can be useful to clean up the rows created and deleted frompg_attributes
that TEMP TABLES cause. In any case, VACUUMing "in session" wasn't the answer.DROP TABLE ... IF EXISTS
before theCREATE TABLE
, instead ofON COMMIT DROP
. Server still dies.CREATE TEMP TABLE (...)
andinsert into ... (select...)
instead ofCREATE TEMP TABLE ... AS
, instead ofON COMMIT DROP
. Server dies.
So is ON COMMIT DROP
not releasing all the associated resources? What else could be holding memory? How do I release it?
Then don't use them. You can still execute queries directly, as shown below.
It is not a completely wrong approach, it's just a very awkward one, as you are trying to create something that's been implemented by others for a much easier use. As a result, you are making many mistakes that can lead to many problems, including memory leaks.
Compare to the simplicity of the exact same example that uses pg-promise:
Gets even simpler when using ES6 syntax:
The only thing that I didn't quite understand in your example - the use of a transaction to execute a single
SELECT
. This is not what transactions are generally for, as you are not changing any data. I assume you were trying to shrink a real piece of code you had that changes some data also.In case you don't need a transaction, your code can be further reduced to:
UPDATE
It would be a dangerous approach, however, not to control the end of the previous request, which also may create memory/connection issues.
A safe approach should be:
Use CTEs to create partial result sets instead of temp tables.
The planner will evaluate each block in sequence the way I wanted to achieve with temp tables.
I know this doesn't directly solve the memory leak issue (I'm pretty sure there's something wrong with Postgres' implementation of them, at least the way they manifest on the RDS configuration).
However, the query works, it is query planned the way I was intending and the memory usage is stable now after 3 days of running the job and my server doesn't crash.
I didn't change the node code at all.