Python Disk-Based Dictionary

I was running some dynamic programming code (trying to brute-force disprove the Collatz conjecture =P) and I was using a dict to store the lengths of the chains I had already computed. Obviously, it ran out of memory at some point. Is there any easy way to use some variant of a dict which will page parts of itself out to disk when it runs out of room? Obviously it will be slower than an in-memory dict, and it will probably end up eating my hard drive space, but this could apply to other problems that are not so futile.

I realized that a disk-based dictionary is pretty much a database, so I manually implemented one using sqlite3, but I didn't do it in any smart way and had it look up every element in the DB one at a time... it was about 300x slower.

Is the smartest way to just create my own set of dicts, keeping only one in memory at a time, and paging them out in some efficient manner?

标签： python database dictionary disk-based

9条回答

对你真心纯属浪费

2楼-- · 2019-01-16 12:53

I've read you think shelve is too slow and you tried to hack your own dict using sqlite.

Another did this too :

http://sebsauvage.net/python/snyppets/index.html#dbdict

It seems pretty efficient (and sebsauvage is a pretty good coder). Maybe you could give it a try ?

0人赞添加讨论(0) 举报

走好不送

3楼-- · 2019-01-16 13:03

You should bring more than one item at a time if there's some heuristic to know which are the most likely items to be retrieved next, and don't forget the indexes like Charles mentions.

0人赞添加讨论(0) 举报

The star\"

4楼-- · 2019-01-16 13:04

I did not try it yet but Hamster DB is promising and has a Python interface.

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

5楼-- · 2019-01-16 13:06

Hash-on-disk is generally addressed with Berkeley DB or something similar - several options are listed in the Python Data Persistence documentation. You can front it with an in-memory cache, but I'd test against native performance first; with operating system caching in place it might come out about the same.

0人赞添加讨论(0) 举报

放我归山

6楼-- · 2019-01-16 13:06

Last time I was facing a problem like this, I rewrote to use SQLite rather than a dict, and had a massive performance increase. That performance increase was at least partially on account of the database's indexing capabilities; depending on your algorithms, YMMV.

A thin wrapper that does SQLite queries in __getitem__ and __setitem__ isn't much code to write.

0人赞添加讨论(0) 举报

ら.Afraid

7楼-- · 2019-01-16 13:09

The 3rd party shove module is also worth taking a look at. It's very similar to shelve in that it is a simple dict-like object, however it can store to various backends (such as file, SVN, and S3), provides optional compression, and is even threadsafe. It's a very handy module

from shove import Shove

mem_store = Shove()
file_store = Shove('file://mystore')

file_store['key'] = value

0人赞添加讨论(0) 举报

1 2 下一页

Python Disk-Based Dictionary

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间