As we know, Some of Python's data structures use hash tables for storing items like set
or dictionary
. So there is no order in these objects. But it seems that, for some sequences of numbers that's not true.
For example consider the following examples :
>>> set([7,2,5,3,6])
set([2, 3, 5, 6, 7])
>>> set([4,5,3,0,1,2])
set([0, 1, 2, 3, 4, 5])
But it isn't sorted if we make a small change :
>>> set([8,2,5,3,6])
set([8, 2, 3, 5, 6])
So the question is: How does Python's hash function work on integer sequences?
Although there are a lot of questions in SO about
hash
and its order,but no one of them explains the algorithm of hash function.So all you need here is know that how python calculate the indices in hash table.
If you go through the
hashtable.c
file in CPython source code you'll see the following lines in_Py_hashtable_set
function which shows the way python calculate the index of hash table keys :So as the hash value of integers is the integer itself * (except for -1) the index is based on the number and the length of your data structure (
ht->num_buckets - 1
) and it calculated with Bitwise-and between(ht->num_buckets - 1)
and the number.Now consider the following example with
set
that use hash-table :For number
33
we have :That actually it's :
Note in this case
(ht->num_buckets - 1)
is8-1=7
or0b111
.And for
1919
:And for
333
:And as well as for the preceding examples in question :
* The hash function for class
int
: