I find myself coding this sort of pattern a lot:
tmp = <some operation>
result = tmp[<boolean expression>]
del tmp
...where <boolean expression>
is to be understood as a boolean expression involving tmp
. (For the time being, tmp
is always a pandas dataframe, but I suppose that the same pattern would show up if I were working with numpy ndarrays--not sure.)
For example:
tmp = df.xs('A')['II'] - df.xs('B')['II']
result = tmp[tmp < 0]
del tmp
As one can guess from the del tmp
at the end, the only reason for creating tmp
at all is so that I can use a boolean expression involving it inside an indexing expression applied to it.
I would love to eliminate the need for this (otherwise useless) intermediate, but I don't know of any efficient1 way to do this. (Please, correct me if I'm wrong!)
As second best, I'd like to push off this pattern to some helper function. The problem is finding a decent way to pass the <boolean expression>
to it. I can only think of indecent ones. E.g.:
def filterobj(obj, criterion):
return obj[eval(criterion % 'obj')]
This actually works2:
filterobj(df.xs('A')['II'] - df.xs('B')['II'], '%s < 0')
# Int
# 0 -1.650107
# 2 -0.718555
# 3 -1.725498
# 4 -0.306617
# Name: II
...but using eval
always leaves me feeling all yukky 'n' stuff... Please let me know if there's some other way.
1E.g., any approach I can think of involving the filter
built-in is probably ineffiencient, since it would apply the criterion (some lambda function) by iterating, "in Python", over the panda (or numpy) object...
2The definition of df
used in the last expression above would be something like this:
import itertools
import pandas as pd
import numpy as np
a = ('A', 'B')
i = range(5)
ix = pd.MultiIndex.from_tuples(list(itertools.product(a, i)),
names=('Alpha', 'Int'))
c = ('I', 'II', 'III')
df = pd.DataFrame(np.random.randn(len(idx), len(c)), index=ix, columns=c)