pandas + dataframe - select by partial string

I have a DataFrame with 4 columns of which 2 contain string values. I was wondering if there was a way to select rows based on a partial string match against a particular column?

In other words, a function or lambda function that would do something like

re.search(pattern, cell_in_question)

returning a boolean. I am familiar with the syntax of df[df['A'] == "hello world"] but can't seem to find a way to do the same with a partial string match say 'hello'.

Would someone be able to point me in the right direction?

标签： python pandas

8条回答

无与为乐者.

2楼-- · 2019-01-01 08:16

How would you filter out "liberty" except with more criterias such as "legacy", "ulic" and etc.

   df_Fixed[~df_Fixed["Busler Group"].map(lambda x: x.startswith('Liberty'))]

0人赞添加讨论(0) 举报

荒废的爱情

3楼-- · 2019-01-01 08:17

Say you have the following DataFrame:

>>> df = pd.DataFrame([['hello', 'hello world'], ['abcd', 'defg']], columns=['a','b'])
>>> df
       a            b
0  hello  hello world
1   abcd         defg

You can always use the in operator in a lambda expression to create your filter.

>>> df.apply(lambda x: x['a'] in x['b'], axis=1)
0     True
1    False
dtype: bool

The trick here is to use the axis=1 option in the apply to pass elements to the lambda function row by row, as opposed to column by column.

0人赞添加讨论(0) 举报

何处买醉

4楼-- · 2019-01-01 08:20

import pandas as pd
k=pd.DataFrame(['hello','doubt','hero','help'])
k.columns=['some_thing']
t=k[k['some_thing'].str.contains("hel")]
d=k.replace(t,'CS')

:::OUTPUT:::

k
Out[95]: 
   some_thing
0  hello
1  doubt
2  hero
3   help

t
Out[99]: 
   some_thing
0  hello
3   help

d
Out[96]: 
    some_thing
0     CS
1  doubt
2  hero
3     CS

0人赞添加讨论(0) 举报

无色无味的生活

5楼-- · 2019-01-01 08:28

Based on github issue #620, it looks like you'll soon be able to do the following:

df[df['A'].str.contains("hello")]

Update: vectorized string methods (i.e., Series.str) are available in pandas 0.8.1 and up.

0人赞添加讨论(0) 举报

刘海飞了

6楼-- · 2019-01-01 08:28

Here's what I ended up doing for partial string matches. If anyone has a more efficient way of doing this please let me know.

def stringSearchColumn_DataFrame(df, colName, regex):
    newdf = DataFrame()
    for idx, record in df[colName].iteritems():

        if re.search(regex, record):
            newdf = concat([df[df[colName] == record], newdf], ignore_index=True)

    return newdf

0人赞添加讨论(0) 举报

呛了眼睛熬了心

7楼-- · 2019-01-01 08:29

If anyone wonders how to perform a related problem: "Select column by partial string"

Use:

df.filter(like='hello')  # select columns which contain the word hello

And to select rows by partial string matching, pass axis=0 to filter:

# selects rows which contain the word hello in their index label
df.filter(like='hello', axis=0)

0人赞添加讨论(0) 举报

1 2 下一页

pandas + dataframe - select by partial string

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间