Python Disk-Based Dictionary

2019-01-16 12:45发布

I was running some dynamic programming code (trying to brute-force disprove the Collatz conjecture =P) and I was using a dict to store the lengths of the chains I had already computed. Obviously, it ran out of memory at some point. Is there any easy way to use some variant of a dict which will page parts of itself out to disk when it runs out of room? Obviously it will be slower than an in-memory dict, and it will probably end up eating my hard drive space, but this could apply to other problems that are not so futile.

I realized that a disk-based dictionary is pretty much a database, so I manually implemented one using sqlite3, but I didn't do it in any smart way and had it look up every element in the DB one at a time... it was about 300x slower.

Is the smartest way to just create my own set of dicts, keeping only one in memory at a time, and paging them out in some efficient manner?

9条回答
对你真心纯属浪费
2楼-- · 2019-01-16 12:53

I've read you think shelve is too slow and you tried to hack your own dict using sqlite.

Another did this too :

http://sebsauvage.net/python/snyppets/index.html#dbdict

It seems pretty efficient (and sebsauvage is a pretty good coder). Maybe you could give it a try ?

查看更多
走好不送
3楼-- · 2019-01-16 13:03

You should bring more than one item at a time if there's some heuristic to know which are the most likely items to be retrieved next, and don't forget the indexes like Charles mentions.

查看更多
The star\"
4楼-- · 2019-01-16 13:04

I did not try it yet but Hamster DB is promising and has a Python interface.

查看更多
我欲成王,谁敢阻挡
5楼-- · 2019-01-16 13:06

Hash-on-disk is generally addressed with Berkeley DB or something similar - several options are listed in the Python Data Persistence documentation. You can front it with an in-memory cache, but I'd test against native performance first; with operating system caching in place it might come out about the same.

查看更多
放我归山
6楼-- · 2019-01-16 13:06

Last time I was facing a problem like this, I rewrote to use SQLite rather than a dict, and had a massive performance increase. That performance increase was at least partially on account of the database's indexing capabilities; depending on your algorithms, YMMV.

A thin wrapper that does SQLite queries in __getitem__ and __setitem__ isn't much code to write.

查看更多
ら.Afraid
7楼-- · 2019-01-16 13:09

The 3rd party shove module is also worth taking a look at. It's very similar to shelve in that it is a simple dict-like object, however it can store to various backends (such as file, SVN, and S3), provides optional compression, and is even threadsafe. It's a very handy module

from shove import Shove

mem_store = Shove()
file_store = Shove('file://mystore')

file_store['key'] = value
查看更多
登录 后发表回答