Does this benchmark seem relevant?

I am trying to benchmark a few method of itertools against generators and list comprehensions. The idea is that I want to build an iterator by filtering some entries from a base list.

Here is the code I came up with(Edited after accepted answer):

   from itertools import ifilter
import collections
import random
import os
from timeit import Timer
os.system('cls')

# define large arrays
listArrays = [xrange(100), xrange(1000), xrange(10000), xrange(100000)]

#Number of element to be filtered out
nb_elem = 100
# Number of times we run the test
nb_rep = 1000


def discard(it):
    collections.deque(it, maxlen=0)


def testGenerator(arr, sample):
    discard(x for x in sample if x in arr)


def testIterator(arr, sample):
    discard(ifilter(sample.__contains__, arr))


def testList(arr, sample):
    discard([x for x in sample if x in arr])


if __name__ == '__main__':

    for arr in listArrays:

        print 'Size of array: %s ' % len(arr)
        print 'number of iterations %s' % nb_rep
        sample = random.sample(arr, nb_elem)

        t1 = Timer('testIterator(arr, sample)', 'from __main__ import testIterator, arr, sample')
        tt1 = t1.timeit(number=nb_rep)

        t2 = Timer('testList(arr, sample)', 'from __main__ import testList, arr, sample')
        tt2 = t2.timeit(number=nb_rep)

        t3 = Timer('testGenerator(arr, sample)', 'from __main__ import testGenerator, arr, sample')
        tt3 = t3.timeit(number=nb_rep)

        norm = min(tt1, tt2, tt3)
        print 'maximum runtime %.6f' % max(tt1, tt2, tt3)
        print 'normalized times:\n iterator: %.6f \n list: %.6f \n generator: %.6f' % \
            (tt1/norm, tt2/norm, tt3/norm)
        print '===========================================

==========='

And the results that I get Please note that the edited version was not run on the same machine ( thus useful to have normalized results) and was ran with a 32bits interpreter with python 2.7.3 :

   Size of array: 100
number of iterations 1000
maximum runtime 0.125595
normalized times:
 iterator: 1.000000
 list: 1.260302
 generator: 1.276030
======================================================
Size of array: 1000
number of iterations 1000
maximum runtime 1.740341
normalized times:
 iterator: 1.466031
 list: 1.010701
 generator: 1.000000
======================================================
Size of array: 10000
number of iterations 1000
maximum runtime 17.033630
normalized times:
 iterator: 1.441600
 list: 1.000000
 generator: 1.010979
======================================================
Size of array: 100000
number of iterations 1000
maximum runtime 169.677963
normalized times:
 iterator: 1.455594
 list: 1.000000
 generator: 1.008846
======================================================

Could you give some suggestions on improvement and comment on whether or not this benchmark can give accurate results?

I know that the condition in my decorator might bias the results. I am hoping for some suggestions regarding that.

Thanks.

First, instead of trying to duplicate everything timeit does, just use it. The time function may not have enough accuracy to be useful, and writing dozens of lines of scaffolding code (especially if it has to hacky things like switching on func.__name__) that you don't need is just inviting bugs for no reason.

Assuming there are no bugs, it probably won't affect the results significantly. You're doing a tiny bit of extra work and charging it to testIterator, but that's only once per outer loop. But still, there's no benefit to doing it, so let's not.

def testGenerator(arr,sample):
    for i in (x for x in sample if x in arr):
        k = random.random()

def testIterator(arr,sample):
    for i in ifilter(lambda x: x in sample, arr):
        k = random.random()

def testList(arr,sample):
    for i in [x for x in sample if x in arr]:
        k = random.random()

tests = testIterator, testGenerator, testList

for arr in listArrays:
    print 'Size of array: %s ' % len(arr)
    print 'number of iterations %s' % nb_rep
    sample = random.sample(arr, nb_elem)
    funcs = [partial(test, arr, sample) for test in tests]
    times = [timeit.timeit(func, number=nb_rep) for func in funcs]
    norm = min(*times)
    print 'maximum runtime %.6f' % max(*times)
    print 'normalized times:\n iterator: %.6f \n list: %.6f \n generator: %.6f' % (times[0]/norm,times[1]/norm,times[2]/norm)
    print '======================================================'

Next, why are you doing that k = random.random() in there? From a quick test, just executing that line N times without the complex loop is 0.19x as long as the whole thing. So, you're adding 20% to each of the numbers, which dilutes the difference between them for no reason.

Once you get rid of that, the for loop is serving no purpose except to consume the iterator, and that's adding extra overhead as well. As of 2.7.3 and 3.3.0, the fastest way to consume an iterator without custom C code is deque(it, maxlen=0), so, let's try this:

def discard(it):
    collections.deque(it, maxlen=0)

def testGenerator(arr,sample):
    discard(x for x in sample if x in arr)

def testIterator(arr,sample):
    discard(ifilter(sample.__contains__, arr))

def testList(arr,sample):
    discard([x for x in sample if x in arr])

Or, alternatively, just have the functions return a generator/ifilter/list and then make the scaffolding call discard on the result (it shouldn't matter either way).

Meanwhile, for the testIterator case, are you trying to test the cost of the lambda vs. an inline expression, or the cost of ifilter vs. a generator? If you want to test the former, this is correct; if the latter, you probably want to optimize that. For example, passing sample.__contains__ instead of lambda x: x in sample seems to be 20% faster in 64-bit Python 3.3.0 and 30% faster in 32-bit 2.7.2 (although for some reason not faster at all in 64-bit 2.7.2).

Finally, unless you're just testing for exactly one implementation/platform/version, make sure to run it on as many as you can. For example, with 64-bit CPython 2.7.2, list and generator are always neck-and-neck while iterator gradually climbs from 1.0x to 1.4x as the lists grow, but in PyPy 1.9.0, iterator is always fastest, with generator and list starting off 2.1x and 1.9x slower but closing to 1.2x as the lists grow.

So, if you decided against iterator because "it's slow", you might be trading a large slowdown on PyPy for a much smaller speedup on CPython.

Of course that might be acceptable, e.g., because even the slowest PyPy run is blazingly fast, or because none of your users use PyPy, or whatever. But it's definitely part of the answer to "is this benchmark relevant?"