Random selection in pandas dataframe

2019-07-14 21:22发布

问题:

I'm trying to solve this more complicated question. Here's a smaller problem:

Given df

a    b
1    2
5    0
5    9
3    6
1    8

How can I create a column C that is a random selection between the two elements of df['a'] and df['b'] of the same row?

So, given this dummy df, the random operator would choose from the pair (1, 2) for row #1, from (5, 0) for row #2...etc.

Thanks

回答1:

import random

n = 2  # target row number
random.sample(df.iloc[n, :2], 1)  # Pick one number from this row.

For the whole dataframe:

>>> df.loc[:, ['a', 'b']].apply(random.sample, args=(1,), axis=1)
0    [1]
1    [5]
2    [9]
3    [3]
4    [8]
dtype: object

Cleaning it up to extract the single values:

>>> pd.Series([i[0] for i in df.loc[:, ['a', 'b']].apply(random.sample, args=(1,), axis=1)], index=df.index)
0    1
1    5
2    9
3    3
4    8
dtype: int64

Or taking advantage that column 'a' is indexed at zero (False) and column 'b' is indexed at 1 (True):

>>> [df.iat[i, j] for i, j in enumerate(1 * (np.random.rand(len(df)) < .5))]
[1, 5, 5, 6, 8]


回答2:

No need to use a separate random module:

s = """a    b
1    2
5    0
5    9
3    6
1    8
"""

df = pd.read_table(StringIO(s),sep='\s+',engine='python')
df.apply(lambda x: x.sample(n=1).iloc[0],axis=1)
#output:
0    1
1    5
2    9
3    6
4    1
dtype: int64