I was doing some performance evaluation using timeit and discovered a performance degredation between python 2.7.10 and python 3.4.3. I narrowed it down to the hash()
function:
python 2.7.10:
>>> import timeit
>>> timeit.timeit('for x in xrange(100): hash(x)', number=100000)
0.4529099464416504
>>> timeit.timeit('hash(1000)')
0.044638872146606445
python 3.4.3:
>>> import timeit
>>> timeit.timeit('for x in range(100): hash(x)', number=100000)
0.6459149940637872
>>> timeit.timeit('hash(1000)')
0.07708719989750534
That's an approx. 40% degradation! It doesn't seem to matter if integers, floats, strings(unicodes or bytearrays), etc, are being hashed; the degradation is about the same. In both cases the hash is returning a 64-bit integer. The above was run on my Mac, and got a smaller degradation (20%) on an Ubuntu box.
I've also used PYTHONHASHSEED=random for the python2.7 tests and in some cases, restarting python for each "case", I saw the hash()
performance get a bit worse, but never as slow as python3.4
Anyone know what's going on here? Was a more-secure, but slower, hash function chosen for python3 ?
There are two changes in
hash()
function between Python 2.7 and Python 3.4References:
object.__hash__
(last line of this section). SpecifyingPYTHONHASHSEED
the value 0 will disable hash randomization.