I've implemented a BloomFilter in python 3.3, and got different results every session. Drilling down this weird behavior got me to the internal hash() function - it returns different hash values for the same string every session.
Example:
>>> hash("235")
-310569535015251310
----- opening a new python console -----
>>> hash("235")
-1900164331622581997
Why is this happening? Why is this useful?
Python uses a random hash seed to prevent attackers from tar-pitting your application by sending you keys designed to collide. See the original vulnerability disclosure. By offsetting the hash with a random seed (set once at startup) attackers can no longer predict what keys will collide.
You can set a fixed seed or disable the feature by setting the
PYTHONHASHSEED
environment variable; the default israndom
but you can set it to a fixed positive integer value, with0
disabling the feature altogether.Python versions 2.7 and 3.2 have the feature disabled by default (use the
-R
switch or setPYTHONHASHSEED=random
to enable it); it is enabled by default in Python 3.3 and up.If you were relying on the order of keys in a Python dictionary or set, then don't. Python uses a hash table to implement these types and their order depends on the insertion and deletion history as well as the random hash seed.
Also see the
object.__hash__()
special method documentation:If you need a stable hash implementation, you probably want to look at the
hashlib
module; this implements cryptographic hash functions. The pybloom project uses this approach.Since the offset consists of a prefix and a suffix (start value and final XORed value, respectively) you cannot just store the offset, unfortunately. On the plus side, this does mean that attackers cannot easily determine the offset with timing attacks either.
Hash randomisation is turned on by default in Python 3. This is a security feature:
In previous versions from 2.6.8, you could switch it on at the command line with -R, or the PYTHONHASHSEED environment option.
You can switch it off by setting
PYTHONHASHSEED
to zero.hash() is a Python built-in function and use it to calculate a hash value for object, not for string or num.
You can see the detail in this page: https://docs.python.org/3.3/library/functions.html#hash.
and hash() values comes from the object's __hash__ method. The doc says the followings:
That's why your have diffent hash value for the same string in different console.
What you implement is not a good way.
When you want to calculate a string hash value, just use hashlib
hash() is aim to get a object hash value, not a stirng.