Python Disk-Based Dictionary

2019-01-16 12:45发布

I was running some dynamic programming code (trying to brute-force disprove the Collatz conjecture =P) and I was using a dict to store the lengths of the chains I had already computed. Obviously, it ran out of memory at some point. Is there any easy way to use some variant of a dict which will page parts of itself out to disk when it runs out of room? Obviously it will be slower than an in-memory dict, and it will probably end up eating my hard drive space, but this could apply to other problems that are not so futile.

I realized that a disk-based dictionary is pretty much a database, so I manually implemented one using sqlite3, but I didn't do it in any smart way and had it look up every element in the DB one at a time... it was about 300x slower.

Is the smartest way to just create my own set of dicts, keeping only one in memory at a time, and paging them out in some efficient manner?

9条回答
beautiful°
2楼-- · 2019-01-16 13:17

With a little bit of thought it seems like you could get the shelve module to do what you want.

查看更多
乱世女痞
3楼-- · 2019-01-16 13:18

The shelve module may do it; at any rate, it should be simple to test. Instead of:

self.lengths = {}

do:

import shelve
self.lengths = shelve.open('lengths.shelf')

The only catch is that keys to shelves must be strings, so you'll have to replace

self.lengths[indx]

with

self.lengths[str(indx)]

(I'm assuming your keys are just integers, as per your comment to Charles Duffy's post)

There's no built-in caching in memory, but your operating system may do that for you anyway.

[actually, that's not quite true: you can pass the argument 'writeback=True' on creation. The intent of this is to make sure storing lists and other mutable things in the shelf works correctly. But a side-effect is that the whole dictionary is cached in memory. Since this caused problems for you, it's probably not a good idea :-) ]

查看更多
Luminary・发光体
4楼-- · 2019-01-16 13:18
登录 后发表回答