Python: How do I write a list to file and then pul

More specific dupe of 875228—Simple data storing in Python.

I have a rather large dict (6 GB) and I need to do some processing on it. I'm trying out several document clustering methods, so I need to have the whole thing in memory at once. I have other functions to run on this data, but the contents will not change.

Currently, every time I think of new functions I have to write them, and then re-generate the dict. I'm looking for a way to write this dict to a file, so that I can load it into memory instead of recalculating all it's values.

to oversimplify things it looks something like: {((('word','list'),(1,2),(1,3)),(...)):0.0, ....}

I feel that python must have a better way than me looping around through some string looking for : and ( trying to parse it into a dictionary.

标签： python pickle

6条回答

姐就是有狂的资本

2楼-- · 2019-01-30 15:31

I'd use shelve, json, yaml, or whatever, as suggested by other answers.

shelve is specially cool because you can have the dict on disk and still use it. Values will be loaded on-demand.

But if you really want to parse the text of the dict, and it contains only strings, ints and tuples like you've shown, you can use ast.literal_eval to parse it. It is a lot safer, since you can't eval full expressions with it - It only works with strings, numbers, tuples, lists, dicts, booleans, and None:

>>> import ast
>>> print ast.literal_eval("{12: 'mydict', 14: (1, 2, 3)}")
{12: 'mydict', 14: (1, 2, 3)}

0人赞添加讨论(0) 举报

Deceive 欺骗

3楼-- · 2019-01-30 15:32

Write it out in a serialized format, such as pickle (a python standard library module for serialization) or perhaps by using JSON (which is a representation that can be evaled to produce the memory representation again).

0人赞添加讨论(0) 举报

干净又极端

4楼-- · 2019-01-30 15:35

Why not use python pickle? Python has a great serializing module called pickle it is very easy to use.

import cPickle
cPickle.dump(obj, open('save.p', 'wb')) 
obj = cPickle.load(open('save.p', 'rb'))

There are two disadvantages with pickle:

It's not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
The format is not human readable.

If you are using python 2.6 there is a builtin module called json. It is as easy as pickle to use:

import json
encoded = json.dumps(obj)
obj = json.loads(encoded)

Json format is human readable and is very similar to the dictionary string representation in python. And doesn't have any security issues like pickle. But might be slower than cPickle.

0人赞添加讨论(0) 举报

我命由我不由天

5楼-- · 2019-01-30 15:40

I would suggest that you use YAML for your file format so you can tinker with it on the disc

How does it look:
  - It is indent based
  - It can represent dictionaries and lists
  - It is easy for humans to understand
An example: This block of code is an example of YAML (a dict holding a list and a string)
Full syntax: http://www.yaml.org/refcard.html

To get it in python, just easy_install pyyaml. See http://pyyaml.org/

It comes with easy file save / load functions, that I can't remember right this minute.

0人赞添加讨论(0) 举报

Luminary・发光体

6楼-- · 2019-01-30 15:41

This solution at SourceForge uses only standard Python modules:

y_serial.py module :: warehouse Python objects with SQLite

"Serialization + persistance :: in a few lines of code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Most useful "standard" module for a database to store schema-less data."

http://yserial.sourceforge.net

The compression bonus will probably reduce your 6GB dictionary to 1GB. If you do not want a store a series of dictionaries, the module also contains a file.gz solution which might be more suitable given your dictionary size.

0人赞添加讨论(0) 举报

Melony?

7楼-- · 2019-01-30 15:49

Here are a few alternatives depending on your requirements:

numpy stores your plain data in a compact form and performs group/mass operations well
shelve is like a large dict backed up by a file
some 3rd party storage module, e.g. stash, stores arbitrary plain data
proper database, e.g. mongodb for hairy data or mysql or sqlite plain data and faster retrieval

0人赞添加讨论(0) 举报

Python: How do I write a list to file and then pul

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间