Why is creating a set from a concatenated list fas

While trying to answer What is the preferred way to compose a set from multiple lists in Python, I did some performance analysis and came up with a somewhat surprising conclusion.

Using

python -m timeit -s '
import itertools
import random
n=1000000
random.seed(0)
A = [random.randrange(1<<30) for _ in xrange(n)]
B = [random.randrange(1<<30) for _ in xrange(n)]
C = [random.randrange(1<<30) for _ in xrange(n)]'

for setup, I timed the following snippets:

> $TIMEIT 'set(A+B+C)'
10 loops, best of 3: 872 msec per loop

> $TIMEIT 's = set(A); s.update(B); s.update(C)'
10 loops, best of 3: 930 msec per loop

> $TIMEIT 's = set(itertools.chain(A,B,C))'
10 loops, best of 3: 941 msec per loop

To my surprise, set(A+B+C) is the fastest despite the fact that it creates an intermediate list containing 3000000 elements. .update and itertools.chain are both slower, even though neither of them copy any lists.

What's going on here?

EDIT: On a second machine (OS X 10.10.5, Python 2.7.10, 2.5GHz Core i7), I ran the following script (which runs the tests forwards and backwards to avoid ordering effects):

SETUP='import itertools
import random
n=1000000
random.seed(0)
A = [random.randrange(1<<30) for _ in xrange(n)]
B = [random.randrange(1<<30) for _ in xrange(n)]
C = [random.randrange(1<<30) for _ in xrange(n)]'

python -m timeit -s "$SETUP" 'set(A+B+C)'
python -m timeit -s "$SETUP" 's = set(A); s.update(B); s.update(C)'
python -m timeit -s "$SETUP" 's = set(itertools.chain(A,B,C))'

python -m timeit -s "$SETUP" 's = set(itertools.chain(A,B,C))'
python -m timeit -s "$SETUP" 's = set(A); s.update(B); s.update(C)'
python -m timeit -s "$SETUP" 'set(A+B+C)'

and obtained the following results:

10 loops, best of 3: 579 msec per loop
10 loops, best of 3: 726 msec per loop
10 loops, best of 3: 775 msec per loop
10 loops, best of 3: 761 msec per loop
10 loops, best of 3: 737 msec per loop
10 loops, best of 3: 555 msec per loop

Now set(A+B+C) is clearly faster, and the results are quite stable - it is hard to chalk this up to mere measurement error. Running this script repeatedly produces similar results.

I get different, non-surprising, results than you on my Win 7 SP1 box with a similar processor with Python 2.7.10, where set(A+B+C) appears to be the slowest way to do it as one might expect. Similar results were obtained with garbage collection re-enabled and with Python 3.4.3.

I used my own performance assessing testbed based on timeit and got the following results:

fastest to slowest execution speeds (Python 2.7.10)
   (10 executions, best of 3 repetitions)

set(A); s.update(B); s.update(C) :  4.787919 secs, rel speed 1.00x,  0.00% slower
              set(A).update(B,C) :  6.463666 secs, rel speed 1.35x, 35.00% slower
     set(itertools.chain(A,B,C)) :  6.743028 secs, rel speed 1.41x, 40.83% slower
                      set(A+B+C) :  8.030483 secs, rel speed 1.68x, 67.72% slower

Benchmarking code:

from __future__ import print_function
import sys
from textwrap import dedent
import timeit

N = 10  # Number of executions of each "algorithm"
R = 3  # number of Repeations of executions

# common setup for all algorithms (not timed)
setup = dedent("""
    import itertools
    import gc
    import random

    try:
        xrange
    except NameError:
        xrange = range

    random.seed(0)
    n = 1000000  # number of elements in each list
    A = [random.randrange(1<<30) for _ in xrange(n)]
    B = [random.randrange(1<<30) for _ in xrange(n)]
    C = [random.randrange(1<<30) for _ in xrange(n)]

    # gc.enable()  # to (re)enable garbage collection if desired
""")

algorithms = {
    "set(A+B+C)": dedent("""
        s = set(A+B+C)
    """),

    "set(A); s.update(B); s.update(C)": dedent("""
        s = set(A); s.update(B); s.update(C)
    """),

    "set(itertools.chain(A,B,C))": dedent("""
        s = set(itertools.chain(A,B,C))
        """),

    "set(A).update(B,C)": dedent("""
        s = set(A).update(B,C)
        """),
}

# execute and time algorithms, collecting results
timings = [
    (label,
     min(timeit.repeat(algorithms[label], setup=setup, repeat=R, number=N)),
    ) for label in algorithms
]

print('fastest to slowest execution speeds (Python {}.{}.{})\n'.format(
        *sys.version_info[:3]),
        '  ({:,d} executions, best of {:d} repetitions)\n'.format(N, R))

longest = max(len(timing[0]) for timing in timings)  # length of longest label
ranked = sorted(timings, key=lambda t: t[1])  # ascending sort by execution time
fastest = ranked[0][1]
for timing in ranked:
    print("{:>{width}} : {:9.6f} secs, rel speed {:4.2f}x, {:6.2f}% slower".
            format(timing[0], timing[1], round(timing[1]/fastest, 2),
                   round((timing[1]/fastest - 1) * 100, 2), width=longest))