All I want to do is serialize and unserialize tuples of strings or ints.
I looked at pickle.dumps() but the byte overhead is significant. Basically it looks like it takes up about 4x as much space as it needs to. Besides, all I need is basic types and have no need to serialize objects.
marshal is a little better in terms of space but the result is full of nasty \x00 bytes. Ideally I would like the result to be human readable.
I thought of just using repr() and eval(), but is there a simple way I could accomplish this without using eval()?
This is getting stored in a db, not a file. Byte overhead matters because it could make the difference between requiring a TEXT column versus a varchar, and generally data compactness affects all areas of db performance.
personally i would use yaml. it's on par with json for encoding size, but it can represent some more complex things (e.g. classes, recursive structures) when necessary.
If you need a space efficient solution you can use Google Protocol buffers.
Protocol buffers - Encoding
Protocol buffers - Python Tutorial
There are some persistence builtins mentioned in the python documentation but I don't think any of these is remarkable smaller in the produced filesize.
You could alway use the configparser but there you only get string, int, float, bool.
Take a look at json, at least the generated
dumps
are readable with many other languages.Maybe you're not using the right protocol:
See the documentation for pickle data formats.
Luckily there is solution which uses COMPRESSION, and solves the general problem involving any arbitrary Python object including new classes. Rather than micro-manage mere tuples sometimes it's better to use a DRY tool.
Your code will be more crisp and readily refactored in similar future situations.
y_serial.py module :: warehouse Python objects with SQLite
"Serialization + persistance :: in a few lines of code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Most useful "standard" module for a database to store schema-less data."
http://yserial.sourceforge.net
[If you are still concerned, why not stick those tuples in a dictionary, then apply y_serial to the dictionary. Probably any overhead will vanish due to the transparent compression in the background by zlib.]
As to readability, the documentation also gives details on why cPickle was selected over json.