Element-wise division by rows between dataframe an

2019-08-01 02:38发布

问题:

I've just started with pandas some weeks ago and now I am trying to perform an element-wise division on rows, but couldn't figure out the proper way to achieve it. Here is my case and data

          date  type    id     ...            1096        1097        1098
0   2014-06-13   cal     1     ...       17.949524   16.247619   15.465079
1   2014-06-13   cow    32     ...        0.523429   -0.854286   -1.520952
2   2014-06-13   cow    47     ...        7.676000    6.521714    5.892381
3   2014-06-13   cow   107     ...        4.161714    3.048571    2.419048
4   2014-06-13   cow   137     ...        3.781143    2.557143    1.931429
5   2014-06-13   cow   255     ...        3.847273    2.509091    1.804329
6   2014-06-13   cow   609     ...        6.097714    4.837714    4.249524
7   2014-06-13   cow   721     ...        3.653143    2.358286    1.633333
8   2014-06-13   cow   817     ...        6.044571    4.934286    4.373333
9   2014-06-13   cow   837     ...        9.649714    8.511429    7.884762
10  2014-06-13   cow   980     ...        1.817143    0.536571   -0.102857
11  2014-06-13   cow  1730     ...        8.512571    7.114286    6.319048
12  2014-06-13  dark     1     ...      168.725714  167.885715  167.600001

my_data.columns
Index(['date', 'type', 'id', '188', '189', '190', '191', '192', '193', '194',
       ...
       '1089', '1090', '1091', '1092', '1093', '1094', '1095', '1096', '1097',
       '1098'],
      dtype='object', length=914)

My goal is to divide all the rows by the row with "type" == "cal", but from the column '188' to the column '1098' (911 columns)

These are the approaches I have tried:

Extracting the row of interest and using it with apply(), divide() and operator '/':

>>> cal_r = my_data[my_data["type"]=="cal"].iloc[:,3:]
my_data.apply(lambda x: x.iloc[3:]/cal_r, axis=1)
0       188 189 190 191 192 193 194 195 ...  1091 10...
1          188      189      190    ...           10...
2           188      189      190    ...         109...
3           188      189      190   ...         1096...
4          188      189   190      191   ...        ...
5            188      189      190    ...         10...
6           188      189      190    ...         109...
7          188      189      190    ...         1096...
8          188      189      190    ...         1096...
9          188      189  190    ...         1096    ...
10          188      189      190     ...          1...
11          188      189      190    ...         109...
12         188      189      190      191   ...     ...
dtype: object

>>> mydata.apply(lambda x: x.iloc[3:].divide(cal_r,axis=1), axis=1)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py", line 6014, in apply
    return op.get_result()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 142, in get_result
    return self.apply_standard()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 248, in apply_standard
    self.apply_series_generator()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 277, in apply_series_generator
    results[i] = self.f(v)
  File "<input>", line 1, in <lambda>
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/ops.py", line 1375, in flex_wrapper
    self._get_axis_number(axis)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 375, in _get_axis_number
    .format(axis, type(self)))
ValueError: ("No axis named 1 for object type <class 'pandas.core.series.Series'>", 'occurred at index 0')

Without using apply:

>>> my_data.iloc[:,3:].divide(cal_r)
    188  189  190  191  192  193  ...   1093  1094  1095  1096  1097  1098
0   1.0  1.0  1.0  1.0  1.0  1.0  ...    1.0   1.0   1.0   1.0   1.0   1.0
1   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
2   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
3   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
4   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
5   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
6   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
7   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
8   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
9   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
10  NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
11  NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
12  NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN

The commands my_data.iloc[:,3:].divide(cal_r, axis=1) and my_data.iloc[:,3:]/cal_r give the same result, divides just the first row.

If I select just one row, it is done well:

my_data.iloc[5,3:]/cal_r
       188      189      190    ...         1096      1097      1098
0  48.8182  48.8274  22.4476    ...     0.214338  0.154428  0.116671

[1 rows x 911 columns]

Is there something basic I am missing? I suspect that I will need to replicate the cal_r row the same number of rows of the whole data.

Any hint or guidance is really appreciated.


Related: divide pandas dataframe elements by its line max

回答1:

I believe you need convert Series to numpy array for divide by 1d array:

cal_r = my_data.iloc[(my_data["type"]=="cal").values, 3:]
print (cal_r)
        1096       1097       1098
0  17.949524  16.247619  15.465079

my_data.iloc[:, 3:] /= cal_r.values
print (my_data)
          date  type    id      1096       1097       1098
0   2014-06-13   cal     1  1.000000   1.000000   1.000000
1   2014-06-13   cow    32  0.029161  -0.052579  -0.098348
2   2014-06-13   cow    47  0.427644   0.401395   0.381012
3   2014-06-13   cow   107  0.231857   0.187632   0.156420
4   2014-06-13   cow   137  0.210654   0.157386   0.124890
5   2014-06-13   cow   255  0.214338   0.154428   0.116671
6   2014-06-13   cow   609  0.339715   0.297749   0.274782
7   2014-06-13   cow   721  0.203523   0.145147   0.105614
8   2014-06-14   cow   817  0.336754   0.303693   0.282788
9   2014-06-14   cow   837  0.537603   0.523857   0.509843
10  2014-06-14   cow   980  0.101236   0.033025  -0.006651
11  2014-06-14   cow  1730  0.474251   0.437866   0.408601
12  2014-06-14  dark     1  9.400010  10.332943  10.837319

Or convert one row DataFrame to Series by DataFrame.squeeze or select first row by position to Series:

my_data.iloc[:, 3:] = my_data.iloc[:, 3:].div(cal_r.squeeze())
#alternative
#my_data.iloc[:, 3:] = my_data.iloc[:, 3:].div(cal_r.iloc[0])
print (my_data)
          date  type    id      1096       1097       1098
0   2014-06-13   cal     1  1.000000   1.000000   1.000000
1   2014-06-13   cow    32  0.029161  -0.052579  -0.098348
2   2014-06-13   cow    47  0.427644   0.401395   0.381012
3   2014-06-13   cow   107  0.231857   0.187632   0.156420
4   2014-06-13   cow   137  0.210654   0.157386   0.124890
5   2014-06-13   cow   255  0.214338   0.154428   0.116671
6   2014-06-13   cow   609  0.339715   0.297749   0.274782
7   2014-06-13   cow   721  0.203523   0.145147   0.105614
8   2014-06-14   cow   817  0.336754   0.303693   0.282788
9   2014-06-14   cow   837  0.537603   0.523857   0.509843
10  2014-06-14   cow   980  0.101236   0.033025  -0.006651
11  2014-06-14   cow  1730  0.474251   0.437866   0.408601
12  2014-06-14  dark     1  9.400010  10.332943  10.837319