I'm in need of a method to quickly return the number of differences between two large lists. The contents of each list item is either 1 or 0 (single integers), and the amount of items in each list will always be 307200.
This is a sample of my current code:
list1 = <list1> # should be a list of integers containing 1's or 0's
list2 = <list2> # same rule as above, in a slightly different order
diffCount = 0
for index, item in enumerate(list1):
if item != list2[index]:
diffCount += 1
percent = float(diffCount) / float(307200)
The above works but it is way too slow for my purposes. What I would like to know is if there is a quicker way to obtain the number of differences between lists, or the percentage of items that differ?
I have looked at a few similar threads on this site but they all seem to work slightly different from what I want, and the set() examples don't seem to work for my purposes. :P
I would also try the following stdlib-only method:
This works correctly for
len(list1) == len(list2)
. If you are sure that the list items are always integers, you can substituteop.xor
forop.ne
(could improve performance).The percentage of difference is:
float(list_difference_count(l1, l2))/len(l1)
.I don't actually know if this is faster, but you might experiment with some of the "functional" methods python offers. It's usually better for loops to be run by internal, hand-coded subroutines.
Something like this:
You can get at least another 10X speedup if you use NumPy arrays instead of lists.
If possible, use Paul/JayP's answer of using numpy (with xor), if you can only use python's stdlib, itertools' izip in a list comprehension seems the fastest:
I got this (on Python 2.7.1, Snow Leopard):
Is 7 hundredths of a second too slow for your application?