-->

Python 3.41 A set [duplicate]

2019-08-27 21:40发布

问题:

This question already has an answer here:

  • Why is the order in dictionaries and sets arbitrary? 6 answers

I have two questions about sets.

1. So as I read sets are unordered, but when I started experimenting with them I found out that actually there is some kind of ordering thing.

As you can see, there is nothing special in this set:

>>> v_set ={88,11,1,33,21,3,7,55,37,8}
>>> v_set
{33, 1, 3, 37, 7, 8, 11, 21, 55, 88}

But this one is different:

>>> g_set={7,5,11,1,4,13,55,12,2,3,6,20,9,10}
>>> g_set
{1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 20, 55}

I guess, it's because this time I wrote down more closer numbers, and it started to make sense to set those numbers ascending sequence...?

2. And the second question is about pop(). I read that there is no way to control which value gets removed with pop() method, it is completely arbitrary. Bet when I use pop() method it always (I never saw differently) takes the first item from the left side in sets.

As you can see:

>>> v_set
{33, 1, 3, 37, 7, 8, 11, 21, 55, 88}
>>> v_set.pop()
33
>>> v_set.pop()
1
>>> v_set.pop()
3
>>> v_set.pop()
37
>>> v_set.pop()
7
>>> v_set.pop()
8
>>> v_set.pop()
11
>>> v_set.pop()
21
>>> v_set.pop()
55

So is it really completely arbitrary?

回答1:

Note that the order of the elements depends (also) on the order of the insertions. You can easily see this when there are collisions:

In [4]: class Bad:
   ...:     def __init__(self, val, hash_val):
   ...:         self.val = val
   ...:         self.hash_val = hash_val
   ...:     def __str__(self):
   ...:         return 'Bad({0.val}, {0.hash_val})'.format(self)
   ...:     __repr__ = __str__
   ...:     def __eq__(self, other):
   ...:         return self.val == other.val
   ...:     def __hash__(self):
   ...:         return self.hash_val

In [5]: b1 = Bad(1, 1)
   ...: b2 = Bad(2, 1)
   ...: b3 = Bad(3, 2)

In [6]: {b1, b2, b3}
Out[6]: {Bad(2, 1), Bad(3, 2), Bad(1, 1)}

In [7]: {b2, b1, b3}
Out[7]: {Bad(1, 1), Bad(3, 2), Bad(2, 1)}

As you can see in Out[6] the first element is Bad(2, 1) and the last is Bad(1, 1) while in Out[7] the first is Bad(1, 1) and the last is Bad(2, 1).

If there were no collisions:

In [8]: b1 = Bad(1, 1)
   ...: b2 = Bad(2, 2)
   ...: b3 = Bad(3, 3)

In [9]: {b1, b2, b3}
Out[9]: {Bad(1, 1), Bad(2, 2), Bad(3, 3)}

In [10]: {b2, b1, b3}
Out[10]: {Bad(1, 1), Bad(2, 2), Bad(3, 3)}

note how the order didn't change. (Well, the hash is used modulus some n so there can be collisions even if the hashes are different, depending on the size of the underlying table).

In other words the values aren't enough to determine the order of the elements of a set, even if you know how they are implemented. You must also know the order of the insertions.

In general sets do have a well defined order inside a single interpreter run (due to randominzation in python3.3+), however which order is used is depends on the insertions performed (both the value and the order in which they are done), and is arbitrary, i.e. in python3.5 they can change the order without notice, so you cannot rely on it.

They could truly randomize the outputs but this would simply add overhead for no benefit.



回答2:

Yes, the ordering is arbitrary, by definition. Even if items where stored in sorted order, it would still be arbitrary. "Arbitrary" means that Python doesn't promise to order data in any particular way. Because memory is linear it has to use some ordering, but you should never rely on that ordering because it may be changed without notice. (In fact, in the latest versions of Python, the order of items in a set is partially randomized.)

Your second example shows that the order of printing is the same as the order of popping. This makes sense: repr walks the items in the order they are stored in memory, and pop apparently returns the first item according to that same order. Again, you cannot rely on this: it's an implementation detail and if the Python developers figure out a faster way to do pop, they're free to break any code that relies on set ordering.

If you want to know how this works, read up on hash tables.



回答3:

It is not completely arbitrary. But it does not matter.

We call the set unordered because you, as a user or client of that code, must not depend on a specific order. However, based on details of the implementation of the set, it is likely that there is some order.

The same with regards to pop(). It is very likely that the specific implementation of python you use has logic that will lead to clearly deterministic results. However, your code might be used with a python interpreter that uses a different implementation. A random element is the only guarantee you get from the implementation.

To summarize, the documentation gives you a set of guarantees that any compliant python implementation will follow. Additional effects that you observe are implementation details and might change at any time.



标签: python set pop