I have data that is best represented by a tree. Serializing the structure makes the most sense, because I don't want to sort it every time, and it would allow me to make persistent modifications to the data.
On the other hand, this tree is going to be accessed from different processes on different machines, so I'm worried about the details of reading and writing. Basic searches didn't yield very much on the topic.
- If two users simultaneously attempt to revive the tree and read from it, can they both be served at once, or does one arbitrarily happen first?
- If two users have the tree open (assuming they can) and one makes an edit, does the other see the change implemented? (I assume they don't because they each received what amounts to a copy of the original data.)
- If two users alter the object and close it at the same time, again, does one come first, or is an attempt made to make both changes simultaneously?
I was thinking of making a queue of changes to be applied to the tree, and then having the tree execute them in the order of submission. I thought I would ask what my problems are before trying to solve any of them.
Without trying it out I'm fairly sure the answer is:
- They can both be served at once, however, if one user is reading while the other is writing the reading user may get strange results.
- Probably not. Once the tree has been read from the file into memory the other user will not see edits of the first user. If the tree hasn't been read from the file then the change will still be detected.
- Both changes will be made simultaneously and the file will likely be corrupted.
Also, you mentioned shelve. From the shelve documentation:
The shelve module does not support concurrent read/write access to
shelved objects. (Multiple simultaneous read accesses are safe.) When
a program has a shelf open for writing, no other program should have
it open for reading or writing. Unix file locking can be used to solve
this, but this differs across Unix versions and requires knowledge
about the database implementation used.
Personally, at this point, you may want to look into using a simple key-value store like Redis with some kind of optimistic locking.
You might try klepto
, which provides a dictionary interface to a sql database (using sqlalchemy
under the covers). If you choose to persist your data to a mysql
, postgresql
, or other available database (aside from sqlite
), then you can have two or more people access the data simultaneously or have two threads/processes access the database tables -- and have the database manage the concurrent read-writes. Using klepto
with a database backend will perform under concurrent access as well as if you were accessing the database directly. If you don't want to use a database backend, klepto
can write to disk as well -- however there is some potential for conflict when writing to disk -- even though klepto
uses a "copy-on-write, then replace" strategy that minimizes concurrency conflicts when working with files on disk. When working with a file (or directory) backend, your issues 1-2-3 are still handled due to the strategy klepto
employs for saving writes to disk. Additionally, klepto
can use a in-memory caching layer that enables fast access, where loads/dumps from the on-disk (or database) backend are done either on-demand or when the in-memory cache reaches a user-determined size.
To be specific: (1) both are served at the same time. (2) if one user makes an edit, the other user sees the change -- however that change may be 'delayed' if the second user is using an in-memory caching layer. (3) multiple simultaneous writes are not a problem, due to klepto
letting NFS or the sql database handle the "copy-on-write, then replace" changes.
The dictionary interface for klepto.archvives
is also available in a decorator form that provided LRU
caching (and LFU
and others), so if you have a function that is generating/accessing the data, hooking up the archive is really easy -- you get memorization with an on-disk or database backend.
With klepto
, you can pick from several different serialization methods to encrypt your data. You can have klepto
cast data to a string, or use a hashing algorithm (like md5
), or use a pickler (like json
, pickle
, or dill
).
You can get klepto
here: https://github.com/uqfoundation/klepto