pandas + dataframe - select by partial string

2019-01-01 07:42发布

I have a DataFrame with 4 columns of which 2 contain string values. I was wondering if there was a way to select rows based on a partial string match against a particular column?

In other words, a function or lambda function that would do something like

re.search(pattern, cell_in_question) 

returning a boolean. I am familiar with the syntax of df[df['A'] == "hello world"] but can't seem to find a way to do the same with a partial string match say 'hello'.

Would someone be able to point me in the right direction?

标签: python pandas
8条回答
无与为乐者.
2楼-- · 2019-01-01 08:16

How would you filter out "liberty" except with more criterias such as "legacy", "ulic" and etc.

   df_Fixed[~df_Fixed["Busler Group"].map(lambda x: x.startswith('Liberty'))]
查看更多
荒废的爱情
3楼-- · 2019-01-01 08:17

Say you have the following DataFrame:

>>> df = pd.DataFrame([['hello', 'hello world'], ['abcd', 'defg']], columns=['a','b'])
>>> df
       a            b
0  hello  hello world
1   abcd         defg

You can always use the in operator in a lambda expression to create your filter.

>>> df.apply(lambda x: x['a'] in x['b'], axis=1)
0     True
1    False
dtype: bool

The trick here is to use the axis=1 option in the apply to pass elements to the lambda function row by row, as opposed to column by column.

查看更多
何处买醉
4楼-- · 2019-01-01 08:20
import pandas as pd
k=pd.DataFrame(['hello','doubt','hero','help'])
k.columns=['some_thing']
t=k[k['some_thing'].str.contains("hel")]
d=k.replace(t,'CS')

:::OUTPUT:::

k
Out[95]: 
   some_thing
0  hello
1  doubt
2  hero
3   help

t
Out[99]: 
   some_thing
0  hello
3   help

d
Out[96]: 
    some_thing
0     CS
1  doubt
2  hero
3     CS
查看更多
无色无味的生活
5楼-- · 2019-01-01 08:28

Based on github issue #620, it looks like you'll soon be able to do the following:

df[df['A'].str.contains("hello")]

Update: vectorized string methods (i.e., Series.str) are available in pandas 0.8.1 and up.

查看更多
刘海飞了
6楼-- · 2019-01-01 08:28

Here's what I ended up doing for partial string matches. If anyone has a more efficient way of doing this please let me know.

def stringSearchColumn_DataFrame(df, colName, regex):
    newdf = DataFrame()
    for idx, record in df[colName].iteritems():

        if re.search(regex, record):
            newdf = concat([df[df[colName] == record], newdf], ignore_index=True)

    return newdf
查看更多
呛了眼睛熬了心
7楼-- · 2019-01-01 08:29

If anyone wonders how to perform a related problem: "Select column by partial string"

Use:

df.filter(like='hello')  # select columns which contain the word hello

And to select rows by partial string matching, pass axis=0 to filter:

# selects rows which contain the word hello in their index label
df.filter(like='hello', axis=0)  
查看更多
登录 后发表回答