Using groupby and loc to set up a new dataframe

2019-07-23 11:22发布

问题:

Hi I have a data frame as follow:

df = pd.DataFrame()
df['Team1']   = ['A','B','C','D','E','F','A','B','C','D','E','F']
df['Score1']  = [1,2,3,1,2,4,1,2,3,1,2,4]
df['Team2']   = ['U','V','W','X','Y','Z','U','V','W','X','Y','Z']
df['Score2']  = [2,1,2,2,3,3,2,1,2,2,3,3]
df['Match']   = df['Team1']  + ' Vs '+ df['Team2']
df['Match_no']= [1,2,3,4,5,6,1,2,3,4,5,6]
df['model']  = ['ELO','ELO','ELO','ELO','ELO','ELO','xG','xG','xG','xG','xG','xG']
winner = df.Score1>df.Score2
df['winner']  = np.where(winner,df['Team1'],df['Team2'])

What I want to do is to create another date frame for next stage of tournament. In next stage , we will have 3 matches for each Model (ELO and xG).I would like to groupby Model. These matches are groupped by Model, winner from match number 1 and match number 1,winner from Match number 3 vs match number 4 etc. will play (i.e. U vs B,C vs X, Y vs F). Then Can anyone advise me how to extract those teams?

my expected new dataframe will be as follow:

df1 =pd.DataFrame()
df1['Team1']   = ['U','C','Y','U','C','Y']
df1['Team2']   = ['B','X','F','B','X','F']

df1['Match']   = df1['Team1']  + ' Vs '+ df1['Team2']
df1['Match_no']= [1,2,3,1,2,3]
df1['model']  = ['ELO','ELO','ELO','xG','xG','xG']

How can i set up this? Thanks,

Zep

回答1:

I will try to give you an answer although I have a hard time to understand what you mean by "winner from odd match number and even match number will play".

If this means that the winners from matches 1 and 2 pair, then 3 and 4 etc. you can do something as simple as

df1['Team1'] = df.loc[::2, 'winner']
df1['Team2'] = df.loc[1::2, 'winner']

given that your data is sorted as presented. You may achieve this by

df[df['model'] == 'ELO'].sort_values('Match_no')

etc. pandas-groupby seems not to be needed if I got you right.



回答2:

You can use GroupBy.cumcount for count per groups:

df1 = pd.DataFrame()
df1['Team1'] = df.loc[::2, 'winner'].values
df1['Team2'] = df.loc[1::2, 'winner'].values
df1['Match'] = df1['Team1']  + ' Vs '+ df1['Team2']
model = df.loc[::2, 'model'].values
df1['Match_no'] = df1.groupby(model).cumcount() + 1
df1['model'] = model
print (df1)
  Team1 Team2   Match  Match_no model
0     U     B  U Vs B         1   ELO
1     C     X  C Vs X         2   ELO
2     Y     F  Y Vs F         3   ELO
3     U     B  U Vs B         1    xG
4     C     X  C Vs X         2    xG
5     Y     F  Y Vs F         3    xG