delete the first element in subview of a matrix

2020-07-11 05:32发布

I have a dataset like this:

[[0,1],
 [0,2],
 [0,3],
 [0,4],
 [1,5],
 [1,6],
 [1,7],
 [2,8],
 [2,9]]

I need to delete the first elements of each subview of the data as defined by the first column. So first I get all elements that have 0 in the first column, and delete the first row: [0,1]. Then I get the elements with 1 in the first column and delete the first row [1,5], next step I delete [2,8] and so on and so forth. In the end, I would like to have a dataset like this:

[[0,2],
 [0,3],
 [0,4],
 [1,6],
 [1,7],
 [2,9]]

EDIT: Can this be done in numpy? My dataset is very large so for loops on all elements take at least 4 minutes to complete.

标签: python numpy
5条回答
啃猪蹄的小仙女
2楼-- · 2020-07-11 06:15
a = [[0,1],
 [0,2],
 [0,3],
 [0,4],
 [1,5],
 [1,6],
 [1,7],
 [2,8],
 [2,9]]

a = [y for x in itertools.groupby(a, lambda x: x[0]) for y in list(x[1])[1:]]

print a
查看更多
一夜七次
3楼-- · 2020-07-11 06:17

My answer is :

from operator import itemgetter
sorted(l, key=itemgetter(1))  # fist sort by fist element of inner list 
nl = []
[[0, 1], [0, 2], [0, 3], [0, 4], [1, 5], [1, 6], [1, 7], [2, 8], [2, 9]]
j = 0;
for i in range(len(l)): 
    if(j == l[i][0]):
        j = j + 1   # skip element 
    else:
        nl.append(l[i])  # otherwise append  in new list

output is:

>>> nl
[[0, 2], [0, 3], [0, 4], [1, 6], [1, 7], [2, 9]]
查看更多
劫难
4楼-- · 2020-07-11 06:24

As requested, a numpy solution:

import numpy as np
a = np.array([[0,1], [0,2], [0,3], [0,4], [1,5], [1,6], [1,7], [2,8], [2,9]])
_,i = np.unique(a[:,0], return_index=True)

b = np.delete(a, i, axis=0)

(above is edited to incorporate @Jaime's solution, here is my original masking solution for posterity's sake)

m = np.ones(len(a), dtype=bool)
m[i] = False
b = a[m]

Interestingly, the mask seems to be faster:

In [225]: def rem_del(a):
   .....:     _,i = np.unique(a[:,0], return_index=True)
   .....:     return np.delete(a, i, axis = 0)
   .....: 

In [226]: def rem_mask(a):
   .....:     _,i = np.unique(a[:,0], return_index=True)
   .....:     m = np.ones(len(a), dtype=bool)
   .....:     m[i] = False
   .....:     return a[m]
   .....: 

In [227]: timeit rem_del(a)
10000 loops, best of 3: 181 us per loop

In [228]: timeit rem_mask(a)
10000 loops, best of 3: 59 us per loop
查看更多
男人必须洒脱
5楼-- · 2020-07-11 06:27

You want to use itertools.groupby() with a dash of itertools.islice() and itertools.chain:

from itertools import islice, chain, groupby
from operator import itemgetter

list(chain.from_iterable(islice(group, 1, None)
                         for key, group in groupby(inputlist, key=itemgetter(0))))
  • The groupby() call groups the input list into chunks where the first item is the same (itemgetter(0) is the grouping key).
  • The islice(group, 1, None) call turns the groups into iterables where the first element will be skipped.
  • The chain.from_iterable() call takes each islice() result and chains them together into a new iterable, which list() turns back into a list.

Demo:

>>> list(chain.from_iterable(islice(group, 1, None) for key, group in groupby(inputlist, key=itemgetter(0))))
[[0, 2], [0, 3], [0, 4], [1, 6], [1, 7], [2, 9]]
查看更多
太酷不给撩
6楼-- · 2020-07-11 06:33

Pass in your lists and the key that you want to check values on.

def getsubset(set, index):
    hash = {}
    for list in set:
        if not list[index] in hash:
            set.remove(list)
            hash[list[index]]  = list

    return set
查看更多
登录 后发表回答