I basically do sequences of dump and load, but at some point I want to delete one of the loaded entries. How can I do that? Is there a way to remove, or edit entries saved with Python pickle/cpickle?
Edit: The data is saved with pickle in a binary file.
To delete a pickled object from a binary file you must rewrite the whole file.
The pickle
module doesn't deal with modifications at arbitrary portions of the stream, so there is no built-in way of doing what you want.
Probably the simplest alternative to binary files is to use the shelve
module.
This module provides a dict
like interface to a database containing the pickled data, as you can see from the example in the documentation:
import shelve
d = shelve.open(filename) # open -- file may get suffix added by low-level
# library
d[key] = data # store data at key (overwrites old data if
# using an existing key)
data = d[key] # retrieve a COPY of data at key (raise KeyError if no
# such key)
del d[key] # delete data stored at key (raises KeyError
# if no such key)
flag = key in d # true if the key exists
klist = list(d.keys()) # a list of all existing keys (slow!)
# as d was opened WITHOUT writeback=True, beware:
d['xx'] = [0, 1, 2] # this works as expected, but...
d['xx'].append(3) # *this doesn't!* -- d['xx'] is STILL [0, 1, 2]!
# having opened d without writeback=True, you need to code carefully:
temp = d['xx'] # extracts the copy
temp.append(5) # mutates the copy
d['xx'] = temp # stores the copy right back, to persist it
# or, d=shelve.open(filename,writeback=True) would let you just code
# d['xx'].append(5) and have it work as expected, BUT it would also
# consume more memory and make the d.close() operation slower.
d.close() # close it
The database used is ndbm
or gdbm
, depending on the platform and the libraries available.
Note: this works well if the data is not moved to an other platform. If you want to be able to copy the database to an other computer then shelve
wont work well, since it does not provide guarantees regarding which library will be used. In this case using an explicit SQL database is probably the best option.