I need a faster way to store and access around 3GB of k:v
pairs. Where k
is a string
or an integer
and v
is an np.array()
that can be of different shapes.
Is there any object, that is faster than the standard python dict in storing and accessing such a table? For example, a pandas.DataFrame
?
As far I have understood python dict is a quite fast implementation of a hashtable, is there anything better than that for my specific case?
You can think of storing them in Data structure like Trie given your key is string. Even to store and retrieve from Trie you need O(N) where N is maximum length of key. Same happen to hash calculation which computes hash for key. Hash is used to find and store in Hash Table. We often don't consider the hashing time or computation.
You may give a shot to Trie, Which should be almost equal performance, may be little bit faster( if hash value is computed differently for say
or something similar due to collision we need to use 256^i.
You can try to store them in Trie and see how it performs.
No there is nothing faster than a dictionary for this task and that’s because the complexity of its indexing and even membership checking is approximately O(1).
Once you saved your items in a dictionary, you can access them in a constant time. That said, the problem is not the indexing process. But you might be able to make the process slightly faster by doing some changes in your objects and their types. This might cause some optimizations in under the hood's operations. For example, if your strings (keys) are not very large you can intern them, in order to be cashed in memory rather than being created as an object. If the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. That makes the access to object very faster. Python has provided an
intern()
function withinsys
module that you can use it for this aim.Here is an example:
No, I don't think there is anything faster than
dict
. The time complexity of its index checking isO(1)
.PS https://wiki.python.org/moin/TimeComplexity