Return dataframe subset based on a list of boolean

I'm trying to slice a dataframe based on list of values, how would I go about this?

Say I have an expression or a list l = [0,1,0,0,1,1,0,0,0,1]

How to return those rows in a dataframe, df, when the corresponding value in the expression/list is 1? In this example, I would include rows where index is 1, 4, 5, and 9.

标签： python pandas dataframe

6条回答

仙女界的扛把子

2楼-- · 2020-06-09 05:25

You can use masking here:

df[np.array([0,1,0,0,1,1,0,0,0,1],dtype=bool)]

So we construct a boolean array with true and false. Every place where the array is True is a row we select.

Mind that we do not filter inplace. In order to retrieve the result, you have to assign the result to an (optionally different) variable:

df2 = df[np.array([0,1,0,0,1,1,0,0,0,1],dtype=bool)]

0人赞添加讨论(0) 举报

smile是对你的礼貌

3楼-- · 2020-06-09 05:27

Setup
Borrowed @ayhan's setup

df = pd.DataFrame(np.random.randint(10, size=(10, 3)))

Without numpy
not the fastest, but it holds its own and is definitely the shortest.

df[list(map(bool, lst))]

   0  1  2
1  3  5  6
4  6  3  2
5  5  7  6
9  0  0  1

Timing

results.div(results.min(1), 0).round(2).pipe(lambda d: d.assign(Best=d.idxmin(1)))

         ayh   wvo   pir   mxu   wen Best
N                                        
1       1.53  1.00  1.02  4.95  2.61  wvo
3       1.06  1.00  1.04  5.46  2.84  wvo
10      1.00  1.00  1.00  4.30  2.73  ayh
30      1.00  1.05  1.24  4.06  3.76  ayh
100     1.16  1.00  1.19  3.90  3.53  wvo
300     1.29  1.00  1.32  2.50  2.38  wvo
1000    1.54  1.00  2.19  2.24  3.85  wvo
3000    1.39  1.00  2.17  1.81  4.55  wvo
10000   1.22  1.00  2.21  1.35  4.36  wvo
30000   1.19  1.00  2.26  1.39  5.36  wvo
100000  1.19  1.00  2.19  1.31  4.82  wvo

fig, (a1, a2) = plt.subplots(2, 1, figsize=(6, 6))
results.plot(loglog=True, lw=3, ax=a1)
results.div(results.min(1), 0).round(2).plot.bar(logy=True, ax=a2)
fig.tight_layout()

Testing Code

ayh = lambda d, l: d[np.array(l).astype(bool)]
wvo = lambda d, l: d[np.array(l, dtype=bool)]
pir = lambda d, l: d[list(map(bool, l))]
wen = lambda d, l: d.loc[[i for i, x in enumerate(l) if x == 1], :]

def mxu(d, l):
    a = np.array(l)
    return d.query('@a != 0')

results = pd.DataFrame(
    index=pd.Index([1, 3, 10, 30, 100, 300,
                    1000, 3000, 10000, 30000, 100000], name='N'),
    columns='ayh wvo pir mxu wen'.split(),
    dtype=float
)

for i in results.index:
    d = pd.concat([df] * i, ignore_index=True)
    l = lst * i
    for j in results.columns:
        stmt = '{}(d, l)'.format(j)
        setp = 'from __main__ import d, l, {}'.format(j)
        results.set_value(i, j, timeit(stmt, setp, number=10))

0人赞添加讨论(0) 举报

该账号已被封号

4楼-- · 2020-06-09 05:28

Selecting using a list of Booleans is something itertools.compress does well.

Given

>>> df = pd.DataFrame(np.random.randint(10, size=(10, 2)))
>>> selectors = [0, 1, 0, 0, 1, 1, 0, 0, 0, 1]

Code

>>> selected_idxs = list(itertools.compress(df.index, selectors))   # [1, 4, 5, 9]
>>> df.iloc[selected_idxs, :]
   0  1
1  1  9
4  3  4
5  4  1
9  8  9

0人赞添加讨论(0) 举报

Bombasti

5楼-- · 2020-06-09 05:32

yet another "creative" approach:

In [181]: a = np.array(lst)

In [182]: df.query("index * @a > 0")
Out[182]:
   0  1  2
1  1  5  5
4  0  2  0
5  4  9  9
9  2  2  5

or much better variant from @ayhan:

In [183]: df.query("@a != 0")
Out[183]:
   0  1  2
1  1  5  5
4  0  2  0
5  4  9  9
9  2  2  5

PS i've also borrowed @Ayhan's setup

0人赞添加讨论(0) 举报

够拽才男人

6楼-- · 2020-06-09 05:44

Or maybe find the position of 1 in your list and slice from the Dataframe

df.loc[[i for i,x in enumerate(lst) if x == 1],:]

0人赞添加讨论(0) 举报

Juvenile、少年°

7楼-- · 2020-06-09 05:47

Convert the list to a boolean array and then use boolean indexing:

df = pd.DataFrame(np.random.randint(10, size=(10, 3)))

df[np.array(lst).astype(bool)]
Out: 
   0  1  2
1  8  6  3
4  2  7  3
5  7  2  3
9  1  3  4

0人赞添加讨论(0) 举报

Return dataframe subset based on a list of boolean

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间