ZODB: To where transaction.savepoint writes data?

2019-08-10 17:37发布

问题:

According to ZODB documentation:

A savepoint allows a data manager to save work to its storage without committing the full transaction." "Savepoints are also useful to free memory that would otherwise be used to keep the whole state of the transaction.

According to the very instructive article When to commit data in ZODB (Martijn Pieters):

... a point during the whole transaction where you can ask for data to be temporarily stored on disk. [...]
One thing a savepoint does, is call for a garbage collection of the ZODB caches, which means that any data not currently in use is removed from memory.

The thing is, I need to store a lot of items in one transaction, something like this:

for i, item in enumerate(aLotOfItems):
    database[i]=item
    if i % 10000 ==0:
        transaction.savepoint(True)
transaction.commit()

I kindof expected transaction.savepoint to work the same way as bsddb3.db.Db.sync. When Db.sync() is called, the database is flushed and you can observe it. But when a savepoint is set, apparently neither the database nor the tmp file grows or changes in size untill transaction.commit().

I am really confused about:

  • What is actually happening when a savepoint is set?

  • How is it different from commiting/flushing a database?

  • If "data to be temporarily stored on disk", to where does the savepoint write the data?

  • Can I count on savepoints to literally "free memory"?

回答1:

The original, primary use for savepoints is to be able to roll back parts of a transaction.

Say you wanted to accept a large number of log entries, but need to process these in batches into the database:

for batch in per_batch(log_entries):
    sp = transaction.savepoint()
    try:
        process_batch(batch)
    except BatchFailedException:
        sp.rollback()
        transaction.commit()
        raise

Now the transaction has been committed, except the last batch has been rolled back.

That was the original reason to use savepoints. Setting a savepoint has the side-effect of triggering a ZODB cache garbagecollection run.

The ZODB holds a cache of objects recently accessed. This includes objects that don't actually change during the current transaction; you just retrieved them from the database, used their data, and then stopped directly referencing them. The ZODB stores an object graph; one object references other objects, which in turn reference other objects. Each of those objects, if they inherit from the Persistent base class, are separate ZODB records. When you traverse the graph, these objects are all loaded into memory.

The GC run clears them from memory again, provided they haven't changed. Traversing the object graph again would load them into memory again, but clearing them during a savepoint saves memory.

Savepoint data itself is stored on disk in a TmpStorage file, in your TEMP directory. This uses a tempfile.TemporaryFile() object, which for security reasons is created in an unlinked state; the file exists, but the directory entry is cleared immediately on creation. You therefor cannot see this file from outside the ZODB process.

A full commit moves the data into the actual ZODB database and finalises the transaction.



回答2:

The primary usage of savepoints to free memory and to store transaction related data from memory to disk - especially with large transactions and lots of modified data.