How do I store a Python object in memory for use b

2020-08-11 11:15发布

Here's the situation: I have a massive object that needs to be loaded into memory. So big that if it is loaded in twice it will go beyond the available memory on my machine (and no, I can't upgrade the memory). I also can't divide it up into any smaller pieces. For simplicity's sake, let's just say the object is 600 MB and I only have 1 GB of RAM. I need to use this object from a web app, which is running in multiple processes, and I don't control how they're spawned (a third party load balancer does that), so I can't rely on just creating the object in some master thread/process and then spawning off children. This also eliminates the possibility of using something like POSH because that relies on it's own custom fork call. I also can't use something like a SQLite memory database, mmap or the posix_ipc, sysv_ipc, and shm modules because those act as a file in memory, and this data has to be an object for me to use it. Using one of those I would have to read it as a file and then turn it into an object in each individual process and BAM, segmentation fault from going over the machine's memory limit because I just tried to load in a second copy.

There must be someway to store a Python object in memory (and not as a file/string/serialized/pickled) and have it be accessible from any process. I just don't know what it is. I've looked all over StackOverflow and Google and can't find the answer to this, so I'm hoping somebody can help me out.

3条回答
We Are One
2楼-- · 2020-08-11 11:36

http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes

Look for shared memory, or Server process. After re-reading your post Server process sounds closer to what you want.

http://en.wikipedia.org/wiki/Shared_memory

查看更多
姐就是有狂的资本
3楼-- · 2020-08-11 11:48

There must be someway to store a Python object in memory (and not as a file/string/serialized/pickled) and have it be accessible from any process.

That isn't the way in works. Python object reference counting and an object's internal pointers do not make sense across multiple processes.

If the data doesn't have to be an actual Python object, you can try working on the raw data stored in mmap() or in a database or somesuch.

查看更多
爱情/是我丢掉的垃圾
4楼-- · 2020-08-11 11:49

I would implement this as a C module that gets imported into each Python script. Then the interface to this large object would be implemented in C, or some combination of C and Python.

查看更多
登录 后发表回答