Python equivalent of R “split”-function

2019-05-14 17:43发布

In R, you could split a vector according to the factors of another vector:

> a <- 1:10
  [1]  1  2  3  4  5  6  7  8  9 10
> b <- rep(1:2,5)
  [1] 1 2 1 2 1 2 1 2 1 2

> split(a,b)

   $`1`
   [1] 1 3 5 7 9
   $`2`
   [1]  2  4  6  8 10

Thus, grouping a list (in terms of python) according to the values of another list (according to the order of the factors).

Is there anything handy in python like that, except from the itertools.groupby approach?

4条回答
ら.Afraid
2楼-- · 2019-05-14 17:56

From your example, it looks like each element in b contains the 1-indexed list in which the node will be stored. Python lacks the automatic numeric variables that R seems to have, so we'll return a tuple of lists. If you can do zero-indexed lists, and you only need two lists (i.e., for your R use case, 1 and 2 are the only values, in python they'll be 0 and 1)

>>> a = range(1, 11)
>>> b = [0,1] * 5

>>> split(a, b)
([1, 3, 5, 7, 9], [2, 4, 6, 8, 10])

Then you can use itertools.compress:

def split(x, f):
    return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f)))

If you need more general input (multiple numbers), something like the following will return an n-tuple:

def split(x, f):
    count = max(f) + 1
    return tuple( list(itertools.compress(x, (el == i for el in f))) for i in xrange(count) )  

>>> split([1,2,3,4,5,6,7,8,9,10], [0,1,1,0,2,3,4,0,1,2])
([1, 4, 8], [2, 3, 9], [5, 10], [6], [7])
查看更多
Evening l夕情丶
3楼-- · 2019-05-14 17:57

Edit: warning, this a groupby solution, which is not what OP asked for, but it may be of use to someone looking for a less specific way to split the R way in Python.


Here's one way with itertools.

import itertools
# make your sample data
a = range(1,11)
b = zip(*zip(range(len(a)), itertools.cycle((1,2))))[1]

{k: zip(*g)[1] for k, g in itertools.groupby(sorted(zip(b,a)), lambda x: x[0])}
# {1: (1, 3, 5, 7, 9), 2: (2, 4, 6, 8, 10)}

This gives you a dictionary, which is analogous to the named list that you get from R's split.

查看更多
贪生不怕死
4楼-- · 2019-05-14 18:02

As a long time R user I was wondering how to do the same thing. It's a very handy function for tabulating vectors. This is what I came up with:

a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]

from collections import defaultdict
def split(x, f):
    res = defaultdict(list)
    for v, k in zip(x, f):
        res[k].append(v)
    return res

>>> split(a, b)
defaultdict(list, {1: [1, 3, 5, 7, 9], 2: [2, 4, 6, 8, 10]})
查看更多
Animai°情兽
5楼-- · 2019-05-14 18:23

You could try:

a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]

split_1 = [a[k] for k in (i for i,j in enumerate(b) if j == 1)]
split_2 = [a[k] for k in (i for i,j in enumerate(b) if j == 2)]

results in:

In [22]: split_1
Out[22]: [1, 3, 5, 7, 9]

In [24]: split_2
Out[24]: [2, 4, 6, 8, 10]

To make this generalise you can simply iterate over the unique elements in b:

splits = {}
for index in set(b):
   splits[index] =  [a[k] for k in (i for i,j in enumerate(b) if j == index)]
查看更多
登录 后发表回答