Return dataframe subset based on a list of boolean

2020-06-09 05:18发布

I'm trying to slice a dataframe based on list of values, how would I go about this?

Say I have an expression or a list l = [0,1,0,0,1,1,0,0,0,1]

How to return those rows in a dataframe, df, when the corresponding value in the expression/list is 1? In this example, I would include rows where index is 1, 4, 5, and 9.

6条回答
仙女界的扛把子
2楼-- · 2020-06-09 05:25

You can use masking here:

df[np.array([0,1,0,0,1,1,0,0,0,1],dtype=bool)]

So we construct a boolean array with true and false. Every place where the array is True is a row we select.

Mind that we do not filter inplace. In order to retrieve the result, you have to assign the result to an (optionally different) variable:

df2 = df[np.array([0,1,0,0,1,1,0,0,0,1],dtype=bool)]
查看更多
smile是对你的礼貌
3楼-- · 2020-06-09 05:27

Setup
Borrowed @ayhan's setup

df = pd.DataFrame(np.random.randint(10, size=(10, 3)))

Without numpy
not the fastest, but it holds its own and is definitely the shortest.

df[list(map(bool, lst))]

   0  1  2
1  3  5  6
4  6  3  2
5  5  7  6
9  0  0  1

Timing

results.div(results.min(1), 0).round(2).pipe(lambda d: d.assign(Best=d.idxmin(1)))

         ayh   wvo   pir   mxu   wen Best
N                                        
1       1.53  1.00  1.02  4.95  2.61  wvo
3       1.06  1.00  1.04  5.46  2.84  wvo
10      1.00  1.00  1.00  4.30  2.73  ayh
30      1.00  1.05  1.24  4.06  3.76  ayh
100     1.16  1.00  1.19  3.90  3.53  wvo
300     1.29  1.00  1.32  2.50  2.38  wvo
1000    1.54  1.00  2.19  2.24  3.85  wvo
3000    1.39  1.00  2.17  1.81  4.55  wvo
10000   1.22  1.00  2.21  1.35  4.36  wvo
30000   1.19  1.00  2.26  1.39  5.36  wvo
100000  1.19  1.00  2.19  1.31  4.82  wvo

fig, (a1, a2) = plt.subplots(2, 1, figsize=(6, 6))
results.plot(loglog=True, lw=3, ax=a1)
results.div(results.min(1), 0).round(2).plot.bar(logy=True, ax=a2)
fig.tight_layout()

enter image description here


Testing Code

ayh = lambda d, l: d[np.array(l).astype(bool)]
wvo = lambda d, l: d[np.array(l, dtype=bool)]
pir = lambda d, l: d[list(map(bool, l))]
wen = lambda d, l: d.loc[[i for i, x in enumerate(l) if x == 1], :]

def mxu(d, l):
    a = np.array(l)
    return d.query('@a != 0')

results = pd.DataFrame(
    index=pd.Index([1, 3, 10, 30, 100, 300,
                    1000, 3000, 10000, 30000, 100000], name='N'),
    columns='ayh wvo pir mxu wen'.split(),
    dtype=float
)

for i in results.index:
    d = pd.concat([df] * i, ignore_index=True)
    l = lst * i
    for j in results.columns:
        stmt = '{}(d, l)'.format(j)
        setp = 'from __main__ import d, l, {}'.format(j)
        results.set_value(i, j, timeit(stmt, setp, number=10))
查看更多
该账号已被封号
4楼-- · 2020-06-09 05:28

Selecting using a list of Booleans is something itertools.compress does well.

Given

>>> df = pd.DataFrame(np.random.randint(10, size=(10, 2)))
>>> selectors = [0, 1, 0, 0, 1, 1, 0, 0, 0, 1]

Code

>>> selected_idxs = list(itertools.compress(df.index, selectors))   # [1, 4, 5, 9]
>>> df.iloc[selected_idxs, :]
   0  1
1  1  9
4  3  4
5  4  1
9  8  9
查看更多
Bombasti
5楼-- · 2020-06-09 05:32

yet another "creative" approach:

In [181]: a = np.array(lst)

In [182]: df.query("index * @a > 0")
Out[182]:
   0  1  2
1  1  5  5
4  0  2  0
5  4  9  9
9  2  2  5

or much better variant from @ayhan:

In [183]: df.query("@a != 0")
Out[183]:
   0  1  2
1  1  5  5
4  0  2  0
5  4  9  9
9  2  2  5

PS i've also borrowed @Ayhan's setup

查看更多
够拽才男人
6楼-- · 2020-06-09 05:44

Or maybe find the position of 1 in your list and slice from the Dataframe

df.loc[[i for i,x in enumerate(lst) if x == 1],:]
查看更多
Juvenile、少年°
7楼-- · 2020-06-09 05:47

Convert the list to a boolean array and then use boolean indexing:

df = pd.DataFrame(np.random.randint(10, size=(10, 3)))

df[np.array(lst).astype(bool)]
Out: 
   0  1  2
1  8  6  3
4  2  7  3
5  7  2  3
9  1  3  4
查看更多
登录 后发表回答