I have a DataFrame
with 4 columns of which 2 contain string values. I was wondering if there was a way to select rows based on a partial string match against a particular column?
In other words, a function or lambda function that would do something like
re.search(pattern, cell_in_question)
returning a boolean. I am familiar with the syntax of df[df['A'] == "hello world"]
but can't seem to find a way to do the same with a partial string match say 'hello'
.
Would someone be able to point me in the right direction?
How would you filter out "liberty" except with more criterias such as "legacy", "ulic" and etc.
Say you have the following
DataFrame
:You can always use the
in
operator in a lambda expression to create your filter.The trick here is to use the
axis=1
option in theapply
to pass elements to the lambda function row by row, as opposed to column by column.:::OUTPUT:::
Based on github issue #620, it looks like you'll soon be able to do the following:
Update: vectorized string methods (i.e., Series.str) are available in pandas 0.8.1 and up.
Here's what I ended up doing for partial string matches. If anyone has a more efficient way of doing this please let me know.
If anyone wonders how to perform a related problem: "Select column by partial string"
Use:
And to select rows by partial string matching, pass
axis=0
to filter: