Why is dictionary ordering non-deterministic?

2019-01-01 02:15发布

问题:

I recently switched from Python 2.7 to Python 3.3, and it seems that while in Python 2 the ordering of dictionary keys was arbitrary but consistent, in Python 3 the ordering of the keys of a dictionary obtained with e.g. vars() appears non-deterministic.

If I run:

class Test(object): pass
parameters = vars(Test)
print(list(parameters.keys()))

in both Python 2.7 and Python 3.3, then:

  • Python 2.7 consistently gives me

    [\'__dict__\', \'__module__\', \'__weakref__\', \'__doc__\']
    
  • With Python 3.3, I can get any random order – for example:

    [\'__weakref__\', \'__module__\', \'__qualname__\', \'__doc__\', \'__dict__\']
    [\'__doc__\', \'__dict__\', \'__qualname__\', \'__module__\', \'__weakref__\']
    [\'__dict__\', \'__module__\', \'__qualname__\', \'__weakref__\', \'__doc__\']
    [\'__weakref__\', \'__doc__\', \'__qualname__\', \'__dict__\', \'__module__\']
    

Where does this non-determinism come from? And why is something like

list({str(i): i for i in range(10)}.keys())

… consistent between runs, always giving

[\'3\', \'2\', \'1\', \'0\', \'7\', \'6\', \'5\', \'4\', \'9\', \'8\']

… ?

回答1:


Update: In Python 3.6, dict has a new implementation which preserves insertion order. From Python 3.7, this order-preserving behaviour is guaranteed:

the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec.


This is the result of a security fix from 2012, which was enabled by default in Python 3.3 (scroll down to \"Security improvements\").

From the announcement:

Hash randomization causes the iteration order of dicts and sets to be unpredictable and differ across Python runs. Python has never guaranteed iteration order of keys in a dict or set, and applications are advised to never rely on it. Historically, dict iteration order has not changed very often across releases and has always remained consistent between successive executions of Python. Thus, some existing applications may be relying on dict or set ordering. Because of this and the fact that many Python applications which don\'t accept untrusted input are not vulnerable to this attack, in all stable Python releases mentioned here, HASH RANDOMIZATION IS DISABLED BY DEFAULT.

As noted above, the last, capitalized bit is no longer true in Python 3.3.

See also: object.__hash__() documentation (\"Note\" sidebar).

If absolutely necessary, you can disable hash randomization in versions of Python affected by this behaviour by setting the PYTHONHASHSEED environment variable to 0.


Your counterexample:

list({str(i): i for i in range(10)}.keys())

… does not in fact always give the same result in Python 3.3, although the number of different orderings is limited due to the way hash collisions are handled:

$ for x in {0..999}
> do
>   python3.3 -c \"print(list({str(i): i for i in range(10)}.keys()))\"
> done | sort | uniq -c
     61 [\'0\', \'1\', \'2\', \'3\', \'4\', \'5\', \'6\', \'7\', \'8\', \'9\']
     73 [\'1\', \'0\', \'3\', \'2\', \'5\', \'4\', \'7\', \'6\', \'9\', \'8\']
     62 [\'2\', \'3\', \'0\', \'1\', \'6\', \'7\', \'4\', \'5\', \'8\', \'9\']
     59 [\'3\', \'2\', \'1\', \'0\', \'7\', \'6\', \'5\', \'4\', \'9\', \'8\']
     58 [\'4\', \'5\', \'6\', \'7\', \'0\', \'1\', \'2\', \'3\', \'8\', \'9\']
     55 [\'5\', \'4\', \'7\', \'6\', \'1\', \'0\', \'3\', \'2\', \'9\', \'8\']
     62 [\'6\', \'7\', \'4\', \'5\', \'2\', \'3\', \'0\', \'1\', \'8\', \'9\']
     63 [\'7\', \'6\', \'5\', \'4\', \'3\', \'2\', \'1\', \'0\', \'9\', \'8\']
     60 [\'8\', \'9\', \'0\', \'1\', \'2\', \'3\', \'4\', \'5\', \'6\', \'7\']
     66 [\'8\', \'9\', \'2\', \'3\', \'0\', \'1\', \'6\', \'7\', \'4\', \'5\']
     65 [\'8\', \'9\', \'4\', \'5\', \'6\', \'7\', \'0\', \'1\', \'2\', \'3\']
     53 [\'8\', \'9\', \'6\', \'7\', \'4\', \'5\', \'2\', \'3\', \'0\', \'1\']
     62 [\'9\', \'8\', \'1\', \'0\', \'3\', \'2\', \'5\', \'4\', \'7\', \'6\']
     52 [\'9\', \'8\', \'3\', \'2\', \'1\', \'0\', \'7\', \'6\', \'5\', \'4\']
     73 [\'9\', \'8\', \'5\', \'4\', \'7\', \'6\', \'1\', \'0\', \'3\', \'2\']
     76 [\'9\', \'8\', \'7\', \'6\', \'5\', \'4\', \'3\', \'2\', \'1\', \'0\']

As noted at the beginning of this answer, that\'s no longer the case in Python 3.6:

$ for x in {0..999}
> do
>   python3.6 -c \"print(list({str(i): i for i in range(10)}.keys()))\"
> done | sort | uniq -c
   1000 [\'0\', \'1\', \'2\', \'3\', \'4\', \'5\', \'6\', \'7\', \'8\', \'9\']