How to compare a list of lists/sets in python?

2020-01-29 04:21发布

What is the easiest way to compare the 2 lists/sets and output the differences? Are there any built in functions that will help me compare nested lists/sets?

Inputs:

First_list = [['Test.doc', '1a1a1a', 1111], 
              ['Test2.doc', '2b2b2b', 2222],  
              ['Test3.doc', '3c3c3c', 3333]
             ]  
Secnd_list = [['Test.doc', '1a1a1a', 1111], 
              ['Test2.doc', '2b2b2b', 2222], 
              ['Test3.doc', '8p8p8p', 9999], 
              ['Test4.doc', '4d4d4d', 4444]]  

Expected Output:

Differences = [['Test3.doc', '3c3c3c', 3333],
               ['Test3.doc', '8p8p8p', 9999], 
               ['Test4.doc', '4d4d4d', 4444]]

7条回答
Ridiculous、
2楼-- · 2020-01-29 04:51

Not sure if there is a nice function for this, but the "manual" way to do it isn't difficult:

differences = []

for list in firstList:
    if list not in secondList:
        differences.append(list)
查看更多
别忘想泡老子
3楼-- · 2020-01-29 04:51
>>> First_list = [['Test.doc', '1a1a1a', '1111'], ['Test2.doc', '2b2b2b', '2222'], ['Test3.doc', '3c3c3c', '3333']] 
>>> Secnd_list = [['Test.doc', '1a1a1a', '1111'], ['Test2.doc', '2b2b2b', '2222'], ['Test3.doc', '3c3c3c', '3333'], ['Test4.doc', '4d4d4d', '4444']] 


>>> z = [tuple(y) for y in First_list]
>>> z
[('Test.doc', '1a1a1a', '1111'), ('Test2.doc', '2b2b2b', '2222'), ('Test3.doc', '3c3c3c', '3333')]
>>> x = [tuple(y) for y in Secnd_list]
>>> x
[('Test.doc', '1a1a1a', '1111'), ('Test2.doc', '2b2b2b', '2222'), ('Test3.doc', '3c3c3c', '3333'), ('Test4.doc', '4d4d4d', '4444')]


>>> set(x) - set(z)
set([('Test4.doc', '4d4d4d', '4444')])
查看更多
别忘想泡老子
4楼-- · 2020-01-29 05:00

i guess you'll have to convert your lists to sets:

>>> a = {('a', 'b'), ('c', 'd'), ('e', 'f')}
>>> b = {('a', 'b'), ('h', 'g')}
>>> a.symmetric_difference(b)
{('e', 'f'), ('h', 'g'), ('c', 'd')}
查看更多
叛逆
5楼-- · 2020-01-29 05:05

So you want the difference between two lists of items.

first_list = [['Test.doc', '1a1a1a', 1111], 
              ['Test2.doc', '2b2b2b', 2222], 
              ['Test3.doc', '3c3c3c', 3333]]
secnd_list = [['Test.doc', '1a1a1a', 1111], 
              ['Test2.doc', '2b2b2b', 2222], 
              ['Test3.doc', '8p8p8p', 9999], 
              ['Test4.doc', '4d4d4d', 4444]]

First I'd turn each list of lists into a list of tuples, so as tuples are hashable (lists are not) so you can convert your list of tuples into a set of tuples:

first_tuple_list = [tuple(lst) for lst in first_list]
secnd_tuple_list = [tuple(lst) for lst in secnd_list]

Then you can make sets:

first_set = set(first_tuple_list)
secnd_set = set(secnd_tuple_list)

EDIT (suggested by sdolan): You could have done the last two steps for each list in a one-liner:

first_set = set(map(tuple, first_list))
secnd_set = set(map(tuple, secnd_list))

Note: map is a functional programming command that applies the function in the first argument (in this case the tuple function) to each item in the second argument (which in our case is a list of lists).

and find the symmetric difference between the sets:

>>> first_set.symmetric_difference(secnd_set) 
set([('Test3.doc', '3c3c3c', 3333),
     ('Test3.doc', '8p8p8p', 9999),
     ('Test4.doc', '4d4d4d', 4444)])

Note first_set ^ secnd_set is equivalent to symmetric_difference.

Also if you don't want to use sets (e.g., using python 2.2), its quite straightforward to do. E.g., with list comprehensions:

>>> [x for x in first_list if x not in secnd_list] + [x for x in secnd_list if x not in first_list]
[['Test3.doc', '3c3c3c', 3333],
 ['Test3.doc', '8p8p8p', 9999],
 ['Test4.doc', '4d4d4d', 4444]]

or with the functional filter command and lambda functions. (You have to test both ways and combine).

>>> filter(lambda x: x not in secnd_list, first_list) + filter(lambda x: x not in first_list, secnd_list)

[['Test3.doc', '3c3c3c', 3333],
 ['Test3.doc', '8p8p8p', 9999],
 ['Test4.doc', '4d4d4d', 4444]]
查看更多
等我变得足够好
6楼-- · 2020-01-29 05:07

By using set comprehensions, you can make it a one-liner. If you want:

to get a set of tuples, then:

Differences = {tuple(i) for i in First_list} ^ {tuple(i) for i in Secnd_list}

Or to get a list of tuples, then:

Differences = list({tuple(i) for i in First_list} ^ {tuple(i) for i in Secnd_list})

Or to get a list of lists (if you really want), then:

Differences = [list(j) for j in {tuple(i) for i in First_list} ^ {tuple(i) for i in Secnd_list}]

PS: I read here: https://stackoverflow.com/a/10973817/4900095 that map() function is not a pythonic way to do things.

查看更多
戒情不戒烟
7楼-- · 2020-01-29 05:13

Old question but here's a solution I use for returning unique elements not found in both lists.

I use this for comparing the values returned from a database and the values generated by a directory crawler package. I didn't like the other solutions I found because many of them could not dynamically handle both flat lists and nested lists.

def differentiate(x, y):
    """
    Retrieve a unique of list of elements that do not exist in both x and y.
    Capable of parsing one-dimensional (flat) and two-dimensional (lists of lists) lists.

    :param x: list #1
    :param y: list #2
    :return: list of unique values
    """
    # Validate both lists, confirm either are empty
    if len(x) == 0 and len(y) > 0:
        return y  # All y values are unique if x is empty
    elif len(y) == 0 and len(x) > 0:
        return x  # All x values are unique if y is empty

    # Get the input type to convert back to before return
    try:
        input_type = type(x[0])
    except IndexError:
        input_type = type(y[0])

    # Dealing with a 2D dataset (list of lists)
    try:
        # Immutable and Unique - Convert list of tuples into set of tuples
        first_set = set(map(tuple, x))
        secnd_set = set(map(tuple, y))

    # Dealing with a 1D dataset (list of items)
    except TypeError:
        # Unique values only
        first_set = set(x)
        secnd_set = set(y)

    # Determine which list is longest
    longest = first_set if len(first_set) > len(secnd_set) else secnd_set
    shortest = secnd_set if len(first_set) > len(secnd_set) else first_set

    # Generate set of non-shared values and return list of values in original type
    return [input_type(i) for i in {i for i in longest if i not in shortest}]
查看更多
登录 后发表回答