Most operations in pandas
can be accomplished with operator chaining (groupby
, aggregate
, apply
, etc), but the only way I've found to filter rows is via normal bracket indexing
df_filtered = df[df['column'] == value]
This is unappealing as it requires I assign df
to a variable before being able to filter on its values. Is there something more like the following?
df_filtered = df.mask(lambda x: x['column'] == value)
I'm not entirely sure what you want, and your last line of code does not help either, but anyway:
"Chained" filtering is done by "chaining" the criteria in the boolean index.
If you want to chain methods, you can add your own mask method and use that one.
I offer this for additional examples. This is the same answer as https://stackoverflow.com/a/28159296/
I'll add other edits to make this post more useful.
pandas.DataFrame.query
query
was made for exactly this purpose. Consider the dataframedf
Let's use
query
to filter all rows whereD > B
Which we chain
Just want to add a demonstration using
loc
to filter not only by rows but also by columns and some merits to the chained operation.The code below can filter the rows by value.
By modifying it a bit you can filter the columns as well.
So why do we want a chained method? The answer is that it is simple to read if you have many operations. For example,
I had the same question except that I wanted to combine the criteria into an OR condition. The format given by Wouter Overmeire combines the criteria into an AND condition such that both must be satisfied:
But I found that, if you wrap each condition in
(... == True)
and join the criteria with a pipe, the criteria are combined in an OR condition, satisfied whenever either of them is true:The answer from @lodagro is great. I would extend it by generalizing the mask function as:
Then you can do stuff like:
If you would like to apply all of the common boolean masks as well as a general purpose mask you can chuck the following in a file and then simply assign them all as follows:
Usage:
It's a little bit hacky but it can make things a little bit cleaner if you're continuously chopping and changing datasets according to filters. There's also a general purpose filter adapted from Daniel Velkov above in the gen_mask function which you can use with lambda functions or otherwise if desired.
File to be saved (I use masks.py):