I have a dataset like this:
[[0,1],
[0,2],
[0,3],
[0,4],
[1,5],
[1,6],
[1,7],
[2,8],
[2,9]]
I need to delete the first elements of each subview of the data as defined by the first column. So first I get all elements that have 0 in the first column, and delete the first row: [0,1]. Then I get the elements with 1 in the first column and delete the first row [1,5], next step I delete [2,8] and so on and so forth. In the end, I would like to have a dataset like this:
[[0,2],
[0,3],
[0,4],
[1,6],
[1,7],
[2,9]]
EDIT: Can this be done in numpy? My dataset is very large so for loops on all elements take at least 4 minutes to complete.
My answer is :
output is:
As requested, a
numpy
solution:(above is edited to incorporate @Jaime's solution, here is my original masking solution for posterity's sake)
Interestingly, the mask seems to be faster:
You want to use
itertools.groupby()
with a dash ofitertools.islice()
anditertools.chain
:groupby()
call groups the input list into chunks where the first item is the same (itemgetter(0)
is the grouping key).islice(group, 1, None)
call turns the groups into iterables where the first element will be skipped.chain.from_iterable()
call takes eachislice()
result and chains them together into a new iterable, whichlist()
turns back into a list.Demo:
Pass in your lists and the key that you want to check values on.