I am using Python with psycopg2 and I'm trying to run a full VACUUM
after a daily operation which inserts several thousand rows. The problem is that when I try to run the VACUUM
command within my code I get the following error:
psycopg2.InternalError: VACUUM cannot run inside a transaction block
How do I run this from the code outside a transaction block?
If it makes a difference, I have a simple DB abstraction class, a subset of which is displayed below for context (not runnable, exception-handling and docstrings omitted and line spanning adjustments made):
class db(object):
def __init__(dbname, host, port, user, password):
self.conn = psycopg2.connect("dbname=%s host=%s port=%s \
user=%s password=%s" \
% (dbname, host, port, user, password))
self.cursor = self.conn.cursor()
def _doQuery(self, query):
self.cursor.execute(query)
self.conn.commit()
def vacuum(self):
query = "VACUUM FULL"
self._doQuery(query)
Don't do it - you don't need VACUUM FULL. Actually if you run somewhat recent version of Postgres (let's say > 8.1) you don't even need to run plain VACUUM manually.
I don't know psycopg2 and PostgreSQL, but only apsw and SQLite, so I think I can not give a "psycopg2" help.
But it seams to me, that PostgreSQL might work similar as SQLite does, it has two modes of operation:
When this is the case, the problem could be inside the access layer psycopg2. When it does normally operate in a way that transactions are implicitely inserted until a commit is made, there could be no "standard way" to make a vacuum.
Of course it could be possible, that "psycopg2" has its special "vacuum" method, or a special operation mode, where no implicit transactions are started.
When no such possibilities exists, there stays one single option (without changing the access layer ;-) ):
Most databases have a shell programm to access the database. The program could run this shell program with a pipe (entering the vacuum-command into the shell), thus using the shell programm to make the vacuum. Since vacuum is a slow operation as such, the start of an external programm will be neglectible. Of course, the actual program should commit all uncommited work before, else there could be a dead-lock situation - the vacuum must wait until end of your last transaction.
While vacuum full is questionable in current versions of postgresql, forcing a 'vacuum analyze' or 'reindex' after certain massive actions can improve performance, or clean up disk usage. This is postgresql specific, and needs to be cleaned up to do the right thing for other databases.
Unfortunately the connection proxy provided by django doesn't provide access to set_isolation_level.
After more searching I have discovered the isolation_level property of the psycopg2 connection object. It turns out that changing this to
0
will move you out of a transaction block. Changing the vacuum method of the above class to the following solves it. Note that I also set the isolation level back to what it previously was just in case (seems to be1
by default).This article (near the end on that page) provides a brief explanation of isolation levels in this context.
Additionally, you can also get the messages given by the Vacuum or Analyse using:
this command print a list with the log message of queries like Vacuum and Analyse:
This can be useful to the DBAs ^^
Note if you're using Django with South to perform a migration you can use the following code to execute a
VACUUM ANALYZE
.