How to convert rows values in dataframe to columns

2020-08-01 06:13发布

问题:

I have specific case where I want to convert this df: print df

Schoolname  Attribute    Value  
0  xyz School  Safe         3.44  
1  xyz School  Cleanliness  2.34  
2  xyz School  Money        4.65  
3  abc School  Safe         4.40  
4  abc School  Cleanliness  4.50  
5  abc School  Money        4.90  
6  lmn School  Safe         2.34   
7  lmn School  Cleanliness  3.89  
8  lmn School  Money        4.65

Which i need to get in this format so that i can convert it to numpy array for linear regression modelling.

required_df:    
   Schoolname  Safe  Cleanliness Money  
0 xyz School   3.44   2.34       4.65   
1 abc School   4.40   4.50       4.90    
2 lmn School   2.34   3.89       4.65

I know we need to do groupby('Schoolname') but unable to think after that to get rows name to become column label and corresponding values reflected in required_df.

I need in this format so that I can convert it to numpy array and give it to Linear Regression model as my X vector.

回答1:

You could use pd.pivot

In [171]: df.pivot(index='Schoolname', columns='Attribute', values='Value')
Out[171]:
Attribute   Cleanliness  Money  Safe
Schoolname
abc-School         4.50   4.90  4.40
lmn-School         3.89   4.65  2.34
xyz-School         2.34   4.65  3.44

or more expressible pd.pivot_table

In [172]: pd.pivot_table(df, values='Value', index='Schoolname', columns='Attribute')
Out[172]:
Attribute   Cleanliness  Money  Safe
Schoolname
abc-School         4.50   4.90  4.40
lmn-School         3.89   4.65  2.34
xyz-School         2.34   4.65  3.44