Get difference between two lists

2018-12-31 05:03发布

I have two lists in Python, like these:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']

I need to create a third list with items from the first list which aren't present in the second one. From the example I have to get:

temp3 = ['Three', 'Four']

Are there any fast ways without cycles and checking?

25条回答
牵手、夕阳
2楼-- · 2018-12-31 05:16

I am little too late in the game for this but you can do a comparison of performance of some of the above mentioned code with this, two of the fastest contenders are,

list(set(x).symmetric_difference(set(y)))
list(set(x) ^ set(y))

I apologize for the elementary level of coding.

import time
import random
from itertools import filterfalse

# 1 - performance (time taken)
# 2 - correctness (answer - 1,4,5,6)
# set performance
performance = 1
numberoftests = 7

def answer(x,y,z):
    if z == 0:
        start = time.clock()
        lists = (str(list(set(x)-set(y))+list(set(y)-set(y))))
        times = ("1 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 1:
        start = time.clock()
        lists = (str(list(set(x).symmetric_difference(set(y)))))
        times = ("2 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 2:
        start = time.clock()
        lists = (str(list(set(x) ^ set(y))))
        times = ("3 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 3:
        start = time.clock()
        lists = (filterfalse(set(y).__contains__, x))
        times = ("4 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 4:
        start = time.clock()
        lists = (tuple(set(x) - set(y)))
        times = ("5 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 5:
        start = time.clock()
        lists = ([tt for tt in x if tt not in y])
        times = ("6 = " + str(time.clock() - start))
        return (lists,times)

    else:    
        start = time.clock()
        Xarray = [iDa for iDa in x if iDa not in y]
        Yarray = [iDb for iDb in y if iDb not in x]
        lists = (str(Xarray + Yarray))
        times = ("7 = " + str(time.clock() - start))
        return (lists,times)

n = numberoftests

if performance == 2:
    a = [1,2,3,4,5]
    b = [3,2,6]
    for c in range(0,n):
        d = answer(a,b,c)
        print(d[0])

elif performance == 1:
    for tests in range(0,10):
        print("Test Number" + str(tests + 1))
        a = random.sample(range(1, 900000), 9999)
        b = random.sample(range(1, 900000), 9999)
        for c in range(0,n):
            #if c not in (1,4,5,6):
            d = answer(a,b,c)
            print(d[1])
查看更多
冷夜・残月
3楼-- · 2018-12-31 05:19

Can be done using python XOR operator.

  • This will remove the duplicates in each list
  • This will show difference of temp1 from temp2 and temp2 from temp1.

set(temp1) ^ set(temp2)
查看更多
深知你不懂我心
4楼-- · 2018-12-31 05:20

most simple way,

use set().difference(set())

list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))

answer is set([1])

can print as a list,

print list(set(list_a).difference(set(list_b)))
查看更多
只若初见
5楼-- · 2018-12-31 05:22

Here's a Counter answer for the simplest case.

This is shorter than the one above that does two-way diffs because it only does exactly what the question asks: generate a list of what's in the first list but not the second.

from collections import Counter

lst1 = ['One', 'Two', 'Three', 'Four']
lst2 = ['One', 'Two']

c1 = Counter(lst1)
c2 = Counter(lst2)
diff = list((c1 - c2).elements())

Alternatively, depending on your readability preferences, it makes for a decent one-liner:

diff = list((Counter(lst1) - Counter(lst2)).elements())

Output:

['Three', 'Four']

Note that you can remove the list(...) call if you are just iterating over it.

Because this solution uses counters, it handles quantities properly vs the many set-based answers. For example on this input:

lst1 = ['One', 'Two', 'Two', 'Two', 'Three', 'Three', 'Four']
lst2 = ['One', 'Two']

The output is:

['Two', 'Two', 'Three', 'Three', 'Four']
查看更多
不再属于我。
6楼-- · 2018-12-31 05:25

The existing solutions all offer either one or the other of:

  • Faster than O(n*m) performance.
  • Preserve order of input list.

But so far no solution has both. If you want both, try this:

s = set(temp2)
temp3 = [x for x in temp1 if x not in s]

Performance test

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000)
print timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000)
print timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000)

Results:

4.34620224079 # ars' answer
4.2770634955  # This answer
30.7715615392 # matt b's answer

The method I presented as well as preserving order is also (slightly) faster than the set subtraction because it doesn't require construction of an unnecessary set. The performance difference would be more noticable if the first list is considerably longer than the second and if hashing is expensive. Here's a second test demonstrating this:

init = '''
temp1 = [str(i) for i in range(100000)]
temp2 = [str(i * 2) for i in range(50)]
'''

Results:

11.3836875916 # ars' answer
3.63890368748 # this answer (3 times faster!)
37.7445402279 # matt b's answer
查看更多
还给你的自由
7楼-- · 2018-12-31 05:25

Try this:

temp3 = set(temp1) - set(temp2)
查看更多
登录 后发表回答