How to remove every occurrence of sub-list from li

2019-02-16 03:55发布

I have two lists:

big_list = [2, 1, 2, 3, 1, 2, 4]
sub_list = [1, 2]

I want to remove all sub_list occurrences in big_list.

result should be [2, 3, 4]

For strings you could use this:

'2123124'.replace('12', '')

But AFAIK this does not work for lists.

This is not a duplicate of Removing a sublist from a list since I want to remove all sub-lists from the big-list. In the other question the result should be [5,6,7,1,2,3,4].

Update: For simplicity I took integers in this example. But list items could be arbitrary objects.

Update2:

if big_list = [1, 2, 1, 2, 1] and sub_list = [1, 2, 1], I want the result to be [2, 1] (like '12121'.replace('121', ''))

Update3:

I don't like copy+pasting source code from StackOverflow into my code. That's why I created second question at software-recommendations: https://softwarerecs.stackexchange.com/questions/51273/library-to-remove-every-occurrence-of-sub-list-from-list-python

Update4: if you know a library to make this one method call, please write it as answer, since this is my preferred solution.

The test should pass this test:

def test_remove_sub_list(self):
    self.assertEqual([1, 2, 3], remove_sub_list([1, 2, 3], []))
    self.assertEqual([1, 2, 3], remove_sub_list([1, 2, 3], [4]))
    self.assertEqual([1, 3], remove_sub_list([1, 2, 3], [2]))
    self.assertEqual([1, 2], remove_sub_list([1, 1, 2, 2], [1, 2]))
    self.assertEquals([2, 1], remove_sub_list([1, 2, 1, 2, 1], [1, 2, 1]))
    self.assertEqual([], remove_sub_list([1, 2, 1, 2, 1, 2], [1, 2]))

14条回答
时光不老,我们不散
2楼-- · 2019-02-16 04:17

Just for fun, here is the closest approximation to a one-liner:

from functools import reduce

big_list = [2, 1, 2, 3, 1, 2, 4]
sub_list = [1, 2]
result = reduce(lambda r, x: r[:1]+([1]+r[2:-r[1]],[min(len(r[0]),r[1]+1)]+r[2:])[r[-r[1]:]!=r[0]]+[x], big_list+[0], [sub_list, 1])[2:-1]

Don't trust that it works? Check it on IDEone!

Of course it's far from efficient and is disgustfully cryptic, however it should help to convince the OP to accept @Mad Physicist's answer.

查看更多
对你真心纯属浪费
3楼-- · 2019-02-16 04:18

As compact as I can:

a=[]
for i in range(0,len(big_list)):
    if big_list[i]==1 and big_list[i+1]==2:
        a.extend((i,i+1))
[big_list[x] for x in [x for x in range(len(big_list)) if x not in a]]

Out[101]: [2, 3, 4]

Same approach in R:

big_list <- c(2, 1, 2, 3, 1, 2, 4)
sub_list <- c(1,2)

a<-c()
for(i in 1:(length(big_list)-1)){if (all(big_list[c(i,i+1)]==sub_list)) a<-c(a,c(i,i+1)) }
big_list[-a]
[1] 2 3 4

I think R is easier.

[big_list[x] for x in [x for x in range(len(big_list)) if x not in a]] is equivalent to big_list[-a]

查看更多
时光不老,我们不散
4楼-- · 2019-02-16 04:21

(For final approach, see last code snippet)

I'd have thought a simple string conversion would be sufficient:

big_list = [2, 1, 2, 3, 1, 2, 4]
sub_list = [1, 2]

new_list = list(map(int, list((''.join(map(str, big_list))).replace((''.join(map(str, sub_list))), ''))))

I'm essentially doing a find/replace with the string equivalents of the lists. I'm mapping them to integers afterwards so that the original types of the variables are retained. This will work for any size of the big and sub lists.

However, it's likely that this won't work if you're calling it on arbitrary objects if they don't have a textual representation. Moreover, this method results in only the textual version of the objects being retained; this is a problem if the original data types need to be maintained.

For this, I've composed a solution with a different approach:

new_list = []
i = 0
while new_list != big_list:
    if big_list[i:i+len(sub_list)] == sub_list:
        del big_list[i:i+len(sub_list)]
    else:
        new_list.append(big_list[i])
        i += 1

Essentially, I'm removing every duplicate of the sub_list when I find them and am appending to the new_list when I find an element which isn't part of a duplicate. When the new_list and big_list are equal, all of the duplicates have been found, which is when I stop. I haven't used a try-except as I don't think there should be any indexing errors.

This is similar to @MadPhysicist's answer and is of roughly the same efficiency, but mine consumes less memory.

This second approach will work for any type of object with any size of lists and therefore is much more flexible than the first approach. However, the first approach is quicker if your lists are just integers.

However, I'm not done yet! I've concocted a one-liner list comprehension which has the same functionality as the second approach!

import itertools
new_list = [big_list[j] for j in range(len(big_list)) if j not in list(itertools.chain.from_iterable([ list(range(i, i+len(sub_list))) for i in [i for i, x in enumerate(big_list) if x == sub_list[0]] if big_list[i:i+len(sub_list)] == sub_list ]))]

Initially, this seems daunting, but I assure you it's quite simple! First, i create a list of the indices where the first element of the sublist has occured. Next, for each of these indices, I check if the following elements form the sublist. If they do, the range of indices which form the duplicate of the sublist is added to another list. Afterwards, I use a function from itertools to flatten the resulting list of lists. Every element in this flattened list is an index which is in a duplicate of the sublist. Finally, I create a new_list which consists of every element of the big_list which has an index not found in the flattened list.

I don't think that this method is in any of the other answers. I like it the most as it's quite neat once you realise how it works and is very efficient (due to the nature of list comprehensions).

查看更多
地球回转人心会变
5楼-- · 2019-02-16 04:22

Use itertools.zip_longest to create n element tuples (where n is length of sub_list) and then filter the current element and next n-1 elements when one of the element matched the sub_list

>>> from itertools import zip_longest, islice
>>> itr = zip_longest(*(big_list[i:] for i in range(len(sub_list))))
>>> [sl[0] for sl in itr if not (sl == tuple(sub_list) and next(islice(itr, len(sub_list)-2, len(sub_list)-1)))]
[2, 3, 4]

To improve the efficiency, you can calculate tuple(sub_list) and len(sub_list) before hand you start filtering

>>> l = len(sub_list)-1
>>> tup = tuple(sub_list)
>>> [sl[0] for sl in itr if not (sl == tup and next(islice(itr, l-1, l)))]
[2, 3, 4]
查看更多
干净又极端
6楼-- · 2019-02-16 04:24

More readable than any above and with no additional memory footprint:

def remove_sublist(sublist, mainlist):

    cursor = 0

    for b in mainlist:
        if cursor == len(sublist):
            cursor = 0
        if b == sublist[cursor]:
            cursor += 1
        else:
            cursor = 0
            yield b

    for i in range(0, cursor):
        yield sublist[i]

This is for onliner if you wanted a function from library, let it be this

[x for x in remove_sublist([1, 2], [2, 1, 2, 3, 1, 2, 4])]
查看更多
Deceive 欺骗
7楼-- · 2019-02-16 04:27

You'd have to implement it yourself. Here is the basic idea:

def remove_sublist(lst, sub):
    i = 0
    out = []
    while i < len(lst):
        if lst[i:i+len(sub)] == sub:
            i += len(sub)
        else:
            out.append(lst[i])
            i += 1
    return out

This steps along every element of the original list and adds it to an output list if it isn't a member of the subset. This version is not very efficient, but it works like the string example you provided, in the sense that it creates a new list not containing your subset. It also works for arbitrary element types as long as they support ==. Removing [1,1,1] from [1,1,1,1] will correctly result in [1], as for a string.

Here is an IDEOne link showing off the result of

>>> remove_sublist([1, 'a', int, 3, float, 'a', int, 5], ['a', int])
[1, 3, <class 'float'>, 5]
查看更多
登录 后发表回答