Pandas groupby in combination with sklearn preproc

2020-07-10 06:35发布

I want to group my DataFrame by specific column and then apply a sklearn preprocessing MinMaxScaler and store the scaler object.

My at the moment starting point:

import pandas as pd
from sklearn import preprocessing

scaler = {}
groups = df.groupby('ID')

for name, group in groups:
  scr = preprocessing.MinMaxScaler()
  scr.fit(group)
  scaler.update({name: scr})
  group = scr.transform(group)

Is this possible with df.groupby('ID').transform ?

UPDATE

From my original DataFrame

pd.DataFrame( dict( ID=list('AAABBB'),
                    VL=(0,10,10,100,100,200))

I want to scale all columns based on ID. In this example:

   A 0.0
   A 1.0
   A 1.0
   B 0.0
   B 0.0
   B 1.0

with the information / scaler object (initialized with fit)

preprocessing.MinMaxScaler().fit( ... )

标签: pandas scipy
1条回答
▲ chillily
2楼-- · 2020-07-10 07:25

you can do it in one direction:

In [62]: from sklearn.preprocessing import minmax_scale

In [63]: df
Out[63]:
  ID   VL  SC
0  A    0   0
1  A   10   1
2  A   10   1
3  B  100   0
4  B  100   0
5  B  200   1

In [64]: df['SC'] = df.groupby('ID').VL.transform(lambda x: minmax_scale(x.astype(float)))

In [65]: df
Out[65]:
  ID   VL  SC
0  A    0   0
1  A   10   1
2  A   10   1
3  B  100   0
4  B  100   0
5  B  200   1

but you will not be anle to use inverse_transform as each call of MinMaxScaler (for each group or each ID) will overwrite the information about your orginal features...

查看更多
登录 后发表回答