Python equivalent of R “split”-function

In R, you could split a vector according to the factors of another vector:

> a <- 1:10
  [1]  1  2  3  4  5  6  7  8  9 10
> b <- rep(1:2,5)
  [1] 1 2 1 2 1 2 1 2 1 2

> split(a,b)

   $`1`
   [1] 1 3 5 7 9
   $`2`
   [1]  2  4  6  8 10

Thus, grouping a list (in terms of python) according to the values of another list (according to the order of the factors).

Is there anything handy in python like that, except from the itertools.groupby approach?

标签： python r grouping

4条回答

ら.Afraid

2楼-- · 2019-05-14 17:56

From your example, it looks like each element in b contains the 1-indexed list in which the node will be stored. Python lacks the automatic numeric variables that R seems to have, so we'll return a tuple of lists. If you can do zero-indexed lists, and you only need two lists (i.e., for your R use case, 1 and 2 are the only values, in python they'll be 0 and 1)

>>> a = range(1, 11)
>>> b = [0,1] * 5

>>> split(a, b)
([1, 3, 5, 7, 9], [2, 4, 6, 8, 10])

Then you can use itertools.compress:

def split(x, f):
    return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f)))

If you need more general input (multiple numbers), something like the following will return an n-tuple:

def split(x, f):
    count = max(f) + 1
    return tuple( list(itertools.compress(x, (el == i for el in f))) for i in xrange(count) )  

>>> split([1,2,3,4,5,6,7,8,9,10], [0,1,1,0,2,3,4,0,1,2])
([1, 4, 8], [2, 3, 9], [5, 10], [6], [7])

0人赞添加讨论(0) 举报

Evening l夕情丶

3楼-- · 2019-05-14 17:57

Edit: warning, this a groupby solution, which is not what OP asked for, but it may be of use to someone looking for a less specific way to split the R way in Python.

Here's one way with itertools.

import itertools
# make your sample data
a = range(1,11)
b = zip(*zip(range(len(a)), itertools.cycle((1,2))))[1]

{k: zip(*g)[1] for k, g in itertools.groupby(sorted(zip(b,a)), lambda x: x[0])}
# {1: (1, 3, 5, 7, 9), 2: (2, 4, 6, 8, 10)}

This gives you a dictionary, which is analogous to the named list that you get from R's split.

0人赞添加讨论(0) 举报

贪生不怕死

4楼-- · 2019-05-14 18:02

As a long time R user I was wondering how to do the same thing. It's a very handy function for tabulating vectors. This is what I came up with:

a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]

from collections import defaultdict
def split(x, f):
    res = defaultdict(list)
    for v, k in zip(x, f):
        res[k].append(v)
    return res

>>> split(a, b)
defaultdict(list, {1: [1, 3, 5, 7, 9], 2: [2, 4, 6, 8, 10]})

0人赞添加讨论(0) 举报

Animai°情兽

5楼-- · 2019-05-14 18:23

You could try:

a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]

split_1 = [a[k] for k in (i for i,j in enumerate(b) if j == 1)]
split_2 = [a[k] for k in (i for i,j in enumerate(b) if j == 2)]

results in:

In [22]: split_1
Out[22]: [1, 3, 5, 7, 9]

In [24]: split_2
Out[24]: [2, 4, 6, 8, 10]

To make this generalise you can simply iterate over the unique elements in b:

splits = {}
for index in set(b):
   splits[index] =  [a[k] for k in (i for i,j in enumerate(b) if j == index)]

0人赞添加讨论(0) 举报

Python equivalent of R “split”-function

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间