Most operations in pandas
can be accomplished with operator chaining (groupby
, aggregate
, apply
, etc), but the only way I've found to filter rows is via normal bracket indexing
df_filtered = df[df['column'] == value]
This is unappealing as it requires I assign df
to a variable before being able to filter on its values. Is there something more like the following?
df_filtered = df.mask(lambda x: x['column'] == value)
pandas provides two alternatives to Wouter Overmeire's answer which do not require any overriding. One is
.loc[.]
with a callable, as inthe other is
.pipe()
, as inThis solution is more hackish in terms of implementation, but I find it much cleaner in terms of usage, and it is certainly more general than the others proposed.
https://github.com/toobaz/generic_utils/blob/master/generic_utils/pandas/where.py
You don't need to download the entire repo: saving the file and doing
should suffice. Then you use it like this:
A slightly less stupid usage example:
By the way: even in the case in which you are just using boolean cols,
can be much more efficient than
because it evaluates
cond2
only wherecond1
isTrue
.DISCLAIMER: I first gave this answer elsewhere because I hadn't seen this.
Filters can be chained using a Pandas query:
Filters can also be combined in a single query:
If you set your columns to search as indexes, then you can use
DataFrame.xs()
to take a cross section. This is not as versatile as thequery
answers, but it might be useful in some situations.Since version 0.18.1 the
.loc
method accepts a callable for selection. Together with lambda functions you can create very flexible chainable filters:If all you're doing is filtering, you can also omit the
.loc
.My answer is similar to the others. If you do not want to create a new function you can use what pandas has defined for you already. Use the pipe method.