selecting random values from dataframe

2019-06-02 19:36发布

问题:

I have a pandas dataframe df which appears as following:

Month   Day mnthShape
1      1    1.016754224
1      1    1.099451003
1      1    0.963911929
1      2    1.016754224
1      1    1.099451003
1      2    0.963911929
1      3    1.016754224
1      3    1.099451003
1      3    1.783775568

I want to get the following from df:

Month   Day mnthShape
1       1   1.016754224
1       2   1.016754224
1       3   1.099451003

where the mnthShape values are selected at random from the index. i.e. if the query is df.loc[(1, 1)] it should look for all values for (1, 1) and select randomly from it a value to be displayed above.

回答1:

Use groupby with apply to select a row at random per group.

np.random.seed(0)
df.groupby(['Month', 'Day'])['mnthShape'].apply(np.random.choice).reset_index()

   Month  Day  mnthShape
0      1    1   1.016754
1      1    2   0.963912
2      1    3   1.099451

If you want to know what index the sampled rows come from, use pd.Series.sample with n=1:

np.random.seed(0)
(df.groupby(['Month', 'Day'])['mnthShape']
   .apply(pd.Series.sample, n=1)
   .reset_index(level=[0, 1]))

   Month  Day  mnthShape
2      1    1   0.963912
3      1    2   1.016754
6      1    3   1.016754


回答2:

One way is to Series.sample() a random row from each group:

pd.np.random.seed(1)

res = df.groupby(['Month', 'Day'])['mnthShape'].apply(lambda x: x.sample()).reset_index(level=[0, 1])

res
   Month  Day  mnthShape
0      1    1   1.099451
1      1    2   1.016754
2      1    3   1.016754