可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I find myself coding this sort of pattern a lot:

tmp = <some operation>
result = tmp[<boolean expression>]
del tmp

...where <boolean expression> is to be understood as a boolean expression involving tmp. (For the time being, tmp is always a pandas dataframe, but I suppose that the same pattern would show up if I were working with numpy ndarrays--not sure.)

For example:

tmp = df.xs('A')['II'] - df.xs('B')['II']
result = tmp[tmp < 0]
del tmp

As one can guess from the del tmp at the end, the only reason for creating tmp at all is so that I can use a boolean expression involving it inside an indexing expression applied to it.

I would love to eliminate the need for this (otherwise useless) intermediate, but I don't know of any efficient¹ way to do this. (Please, correct me if I'm wrong!)

As second best, I'd like to push off this pattern to some helper function. The problem is finding a decent way to pass the <boolean expression> to it. I can only think of indecent ones. E.g.:

def filterobj(obj, criterion):
    return obj[eval(criterion % 'obj')]

This actually works²:

filterobj(df.xs('A')['II'] - df.xs('B')['II'], '%s < 0')

# Int
# 0     -1.650107
# 2     -0.718555
# 3     -1.725498
# 4     -0.306617
# Name: II

...but using eval always leaves me feeling all yukky 'n' stuff... Please let me know if there's some other way.

¹E.g., any approach I can think of involving the filter built-in is probably ineffiencient, since it would apply the criterion (some lambda function) by iterating, "in Python", over the panda (or numpy) object...

²The definition of df used in the last expression above would be something like this:

import itertools
import pandas as pd
import numpy as np
a = ('A', 'B')
i = range(5)
ix = pd.MultiIndex.from_tuples(list(itertools.product(a, i)),
                               names=('Alpha', 'Int'))
c = ('I', 'II', 'III')
df = pd.DataFrame(np.random.randn(len(idx), len(c)), index=ix, columns=c)

回答1:

Because of the way Python works, I think this one's going to be tough. I can only think of hacks which only get you part of the way there. Something like

def filterobj(obj, fn):
    return obj[fn(obj)]

filterobj(df.xs('A')['II'] - df.xs('B')['II'], lambda x: x < 0)

should work, unless I've missed something. Using lambdas this way is one of the usual tricks for delaying evaluation.

Thinking out loud: one could make a this object which isn't evaluated but just sticks around as an expression, something like

>>> this
this
>>> this < 3
this < 3
>>> df[this < 3]
Traceback (most recent call last):
  File "<ipython-input-34-d5f1e0baecf9>", line 1, in <module>
    df[this < 3]
[...]
KeyError: u'no item named this < 3'

and then either special-case the treatment of this into pandas or still have a function like

def filterobj(obj, criterion):
    return obj[eval(str(criterion.subs({"this": "obj"})))]

(with enough work we could lose the eval, this is simply proof of concept) after which something like

>>> tmp = df["I"] + df["II"]
>>> tmp[tmp < 0]
Alpha  Int
A      4     -0.464487
B      3     -1.352535
       4     -1.678836
Dtype: float64
>>> filterobj(df["I"] + df["II"], this < 0)
Alpha  Int
A      4     -0.464487
B      3     -1.352535
       4     -1.678836
Dtype: float64

would work. I'm not sure any of this is worth the headache, though, Python simply isn't very conducive to this style.

回答2:

This is as concise as I could get:

(df.xs('A')['II'] - df.xs('B')['II']).apply(lambda x: x if (x<0) else np.nan).dropna()

Int
0     -4.488312
1     -0.666710
2     -1.995535
Name: II

Selecting from pandas dataframe (or numpy ndarray?

问题:

回答1:

回答2:

收藏的人(0)

Selecting from pandas dataframe (or numpy ndarray?

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮