Find the highest and lowest value locations within

2019-07-29 05:22发布

问题:

Given this pandas dataframe with two columns, 'Values' and 'Intervals'. How do I get a third column 'MinMax' indicating whether the value is a maximum or a minimum within that interval? The challenge for me is that the interval length and the distance between intervals are not fixed, therefore I post the question.

import pandas as pd
import numpy as np


data = pd.DataFrame([
        [1879.289,np.nan],[1879.281,np.nan],[1879.292,1],[1879.295,1],[1879.481,1],[1879.294,1],[1879.268,1],
        [1879.293,1],[1879.277,1],[1879.285,1],[1879.464,1],[1879.475,1],[1879.971,1],[1879.779,1],
        [1879.986,1],[1880.791,1],[1880.29,1],[1879.253,np.nan],[1878.268,np.nan],[1875.73,1],[1876.792,1],
        [1875.977,1],[1876.408,1],[1877.159,1],[1877.187,1],[1883.164,1],[1883.171,1],[1883.495,1],
        [1883.962,1],[1885.158,1],[1885.974,1],[1886.479,np.nan],[1885.969,np.nan],[1884.693,1],[1884.977,1],
        [1884.967,1],[1884.691,1],[1886.171,1],[1886.166,np.nan],[1884.476,np.nan],[1884.66,1],[1882.962,1],
        [1881.496,1],[1871.163,1],[1874.985,1],[1874.979,1],[1871.173,np.nan],[1871.973,np.nan],[1871.682,np.nan],
        [1872.476,np.nan],[1882.361,1],[1880.869,1],[1882.165,1],[1881.857,1],[1880.375,1],[1880.66,1],
        [1880.891,1],[1880.377,1],[1881.663,1],[1881.66,1],[1877.888,1],[1875.69,1],[1875.161,1],
        [1876.697,np.nan],[1876.671,np.nan],[1879.666,np.nan],[1877.182,np.nan],[1878.898,1],[1878.668,1],[1878.871,1],
        [1878.882,1],[1879.173,1],[1878.887,1],[1878.68,1],[1878.872,1],[1878.677,1],[1877.877,1],
        [1877.669,1],[1877.69,1],[1877.684,1],[1877.68,1],[1877.885,1],[1877.863,1],[1877.674,1],
        [1877.676,1],[1877.687,1],[1878.367,1],[1878.179,1],[1877.696,1],[1877.665,1],[1877.667,np.nan],
        [1878.678,np.nan],[1878.661,1],[1878.171,1],[1877.371,1],[1877.359,1],[1878.381,1],[1875.185,1],
        [1875.367,np.nan],[1865.492,np.nan],[1865.495,1],[1866.995,1],[1866.672,1],[1867.465,1],[1867.663,1],
        [1867.186,1],[1867.687,1],[1867.459,1],[1867.168,1],[1869.689,1],[1869.693,1],[1871.676,1],
        [1873.174,1],[1873.691,np.nan],[1873.685,np.nan]
    ])

In the third column below you can see where the max and min is for each interval.

+-------+----------+-----------+---------+
| index |  Value   | Intervals | Min/Max |
+-------+----------+-----------+---------+
|     0 | 1879.289 | np.nan    |         |
|     1 | 1879.281 | np.nan    |         |
|     2 | 1879.292 | 1         |         |
|     3 | 1879.295 | 1         |         |
|     4 | 1879.481 | 1         |         |
|     5 | 1879.294 | 1         |         |
|     6 | 1879.268 | 1         | min     |
|     7 | 1879.293 | 1         |         |
|     8 | 1879.277 | 1         |         |
|     9 | 1879.285 | 1         |         |
|    10 | 1879.464 | 1         |         |
|    11 | 1879.475 | 1         |         |
|    12 | 1879.971 | 1         |         |
|    13 | 1879.779 | 1         |         |
|    17 | 1879.986 | 1         |         |
|    18 | 1880.791 | 1         | max     |
|    19 |  1880.29 | 1         |         |
|    55 | 1879.253 | np.nan    |         |
|    56 | 1878.268 | np.nan    |         |
|    57 |  1875.73 | 1         |         |
|    58 | 1876.792 | 1         |         |
|    59 | 1875.977 | 1         | min     |
|    60 | 1876.408 | 1         |         |
|    61 | 1877.159 | 1         |         |
|    62 | 1877.187 | 1         |         |
|    63 | 1883.164 | 1         |         |
|    64 | 1883.171 | 1         |         |
|    65 | 1883.495 | 1         |         |
|    66 | 1883.962 | 1         |         |
|    67 | 1885.158 | 1         |         |
|    68 | 1885.974 | 1         | max     |
|    69 | 1886.479 | np.nan    |         |
|    70 | 1885.969 | np.nan    |         |
|    71 | 1884.693 | 1         |         |
|    72 | 1884.977 | 1         |         |
|    73 | 1884.967 | 1         |         |
|    74 | 1884.691 | 1         | min     |
|    75 | 1886.171 | 1         | max     |
|    76 | 1886.166 | np.nan    |         |
|    77 | 1884.476 | np.nan    |         |
|    78 |  1884.66 | 1         | max     |
|    79 | 1882.962 | 1         |         |
|    80 | 1881.496 | 1         |         |
|    81 | 1871.163 | 1         | min     |
|    82 | 1874.985 | 1         |         |
|    83 | 1874.979 | 1         |         |
|    84 | 1871.173 | np.nan    |         |
|    85 | 1871.973 | np.nan    |         |
|    86 | 1871.682 | np.nan    |         |
|    87 | 1872.476 | np.nan    |         |
|    88 | 1882.361 | 1         | max     |
|    89 | 1880.869 | 1         |         |
|    90 | 1882.165 | 1         |         |
|    91 | 1881.857 | 1         |         |
|    92 | 1880.375 | 1         |         |
|    93 |  1880.66 | 1         |         |
|    94 | 1880.891 | 1         |         |
|    95 | 1880.377 | 1         |         |
|    96 | 1881.663 | 1         |         |
|    97 |  1881.66 | 1         |         |
|    98 | 1877.888 | 1         |         |
|    99 |  1875.69 | 1         |         |
|   100 | 1875.161 | 1         | min     |
|   101 | 1876.697 | np.nan    |         |
|   102 | 1876.671 | np.nan    |         |
|   103 | 1879.666 | np.nan    |         |
|   111 | 1877.182 | np.nan    |         |
|   112 | 1878.898 | 1         |         |
|   113 | 1878.668 | 1         |         |
|   114 | 1878.871 | 1         |         |
|   115 | 1878.882 | 1         |         |
|   116 | 1879.173 | 1         | max     |
|   117 | 1878.887 | 1         |         |
|   118 |  1878.68 | 1         |         |
|   119 | 1878.872 | 1         |         |
|   120 | 1878.677 | 1         |         |
|   121 | 1877.877 | 1         |         |
|   122 | 1877.669 | 1         |         |
|   123 |  1877.69 | 1         |         |
|   124 | 1877.684 | 1         |         |
|   125 |  1877.68 | 1         |         |
|   126 | 1877.885 | 1         |         |
|   127 | 1877.863 | 1         |         |
|   128 | 1877.674 | 1         |         |
|   129 | 1877.676 | 1         |         |
|   130 | 1877.687 | 1         |         |
|   131 | 1878.367 | 1         |         |
|   132 | 1878.179 | 1         |         |
|   133 | 1877.696 | 1         |         |
|   134 | 1877.665 | 1         | min     |
|   135 | 1877.667 | np.nan    |         |
|   136 | 1878.678 | np.nan    |         |
|   137 | 1878.661 | 1         | max     |
|   138 | 1878.171 | 1         |         |
|   139 | 1877.371 | 1         |         |
|   140 | 1877.359 | 1         |         |
|   141 | 1878.381 | 1         |         |
|   142 | 1875.185 | 1         | min     |
|   143 | 1875.367 | np.nan    |         |
|   144 | 1865.492 | np.nan    |         |
|   145 | 1865.495 | 1         | max     |
|   146 | 1866.995 | 1         |         |
|   147 | 1866.672 | 1         |         |
|   148 | 1867.465 | 1         |         |
|   149 | 1867.663 | 1         |         |
|   150 | 1867.186 | 1         |         |
|   151 | 1867.687 | 1         |         |
|   152 | 1867.459 | 1         |         |
|   153 | 1867.168 | 1         |         |
|   154 | 1869.689 | 1         |         |
|   155 | 1869.693 | 1         |         |
|   156 | 1871.676 | 1         |         |
|   157 | 1873.174 | 1         | min     |
|   158 | 1873.691 | np.nan    |         |
|   159 | 1873.685 | np.nan    |         |
+-------+----------+-----------+---------+

回答1:

isnull = data.iloc[:, 1].isnull()
minmax = data.groupby(isnull.cumsum()[~isnull])[0].agg(['idxmax', 'idxmin'])
data.loc[minmax['idxmax'], 'MinMax'] = 'max'
data.loc[minmax['idxmin'], 'MinMax'] = 'min'
data.MinMax = data.MinMax.fillna('')
print(data)

            0    1 MinMax
0    1879.289  NaN       
1    1879.281  NaN       
2    1879.292  1.0       
3    1879.295  1.0       
4    1879.481  1.0       
5    1879.294  1.0       
6    1879.268  1.0    min
7    1879.293  1.0       
8    1879.277  1.0       
9    1879.285  1.0       
10   1879.464  1.0       
11   1879.475  1.0       
12   1879.971  1.0       
13   1879.779  1.0       
14   1879.986  1.0       
15   1880.791  1.0    max
16   1880.290  1.0       
17   1879.253  NaN       
18   1878.268  NaN       
19   1875.730  1.0    min
20   1876.792  1.0       
21   1875.977  1.0       
22   1876.408  1.0       
23   1877.159  1.0       
24   1877.187  1.0       
25   1883.164  1.0       
26   1883.171  1.0       
27   1883.495  1.0       
28   1883.962  1.0       
29   1885.158  1.0       
..        ...  ...    ...
85   1877.687  1.0       
86   1878.367  1.0       
87   1878.179  1.0       
88   1877.696  1.0       
89   1877.665  1.0    min
90   1877.667  NaN       
91   1878.678  NaN       
92   1878.661  1.0    max
93   1878.171  1.0       
94   1877.371  1.0       
95   1877.359  1.0       
96   1878.381  1.0       
97   1875.185  1.0    min
98   1875.367  NaN       
99   1865.492  NaN       
100  1865.495  1.0    min
101  1866.995  1.0       
102  1866.672  1.0       
103  1867.465  1.0       
104  1867.663  1.0       
105  1867.186  1.0       
106  1867.687  1.0       
107  1867.459  1.0       
108  1867.168  1.0       
109  1869.689  1.0       
110  1869.693  1.0       
111  1871.676  1.0       
112  1873.174  1.0    max
113  1873.691  NaN       
114  1873.685  NaN       

[115 rows x 3 columns]


回答2:

data.columns=['Value','Interval']

data['Ingroup'] = (data['Interval'].notnull() + 0)

Use data['Interval'].notnull() to separate the groups...
Use cumsum() to number them with `groupno`...
Use groupby(groupno)..

Finally you want something using apply/idxmax/idxmin to label the max/min

But of course a for-loop as you suggested is the non-Pythonic but possibly simpler hack.