modification of skipping empty list and continuing

Background

The following code is slightly modified from skipping empty list and continuing with function

import pandas as pd
Names =    [list(['Jon', 'Smith', 'jon', 'John']),
               list([]),
               list(['Bob', 'bobby', 'Bobs']),
               list([]),
               list([])]
df = pd.DataFrame({'Text' : ['Jon J Smith is Here and jon John from ', 
                                       'get nothing from here', 
                                       'I like Bob and bobby and also Bobs diner ',
                                        'nothing here too',
                                        'same here'
                            ], 

                          'P_ID': [1,2,3, 4,5], 
                          'P_Name' : Names

                         })

    #rearrange columns
df = df[['Text', 'P_ID', 'P_Name']]
df

                                 Text         P_ID  P_Name
0   Jon J Smith is Here and jon John from       1   [Jon, Smith, jon, John]
1   get nothing from here                       2   []
2   I like Bob and bobby and also Bobs diner    3   [Bob, bobby, Bobs]
3   nothing here too                            4   []
4   same here                                   5   []

Working code

The following bit of code works taken from skipping empty list and continuing with function

m = df['P_Name'].str.len().ne(0)
df.loc[m, 'New'] = df.loc[m, 'Text'].replace(df.loc[m].P_Name,'**BLOCK**',regex=True)

And produces the following New column in df

            Text   P_ID  P_Name   New
0                                 **BLOCK** J **BLOCK** is Here and **BLOCK** **BLOCK** ...
1                                 NaN
2                                 I like **BLOCK** and **BLOCK** and also **BLOCK** d..
3                                 NaN 
4                                 NaN

Desired Output

However, instead of NaN in row 1, 3, 4, I would like to keep the original text e.g. get nothing from here as seen below

            Text   P_ID  P_Name   New
0                                 **BLOCK** J **BLOCK** is Here and **BLOCK** **BLOCK** ...
1                                 get nothing from here
2                                 I like **BLOCK** and **BLOCK** and also **BLOCK** d..
3                                 nothing here too 
4                                 same here

Question

How do I tweak the code below to achieve my desired output?

m = df['P_Name'].str.len().ne(0)
df.loc[m, 'New'] = df.loc[m, 'Text'].replace(df.loc[m].P_Name,'**BLOCK**',regex=True)

标签： python-3.x string pandas text empty-list

2条回答

叼着烟拽天下

2楼-- · 2020-05-01 04:56

Just add this line in the end fillna

df['New'].fillna(df['Text'],inplace=True)

0人赞添加讨论(0) 举报

疯言疯语

3楼-- · 2020-05-01 05:13

@tawab_shakeel is close. Just add:

df['New'].fillna(df['Text'], inplace=True)

fillna will catch the correct value from df['Text'].

I can also propose an alternative solution using the re module for regex.

def replacing(x):
    if len(x['P_Name']) > 0:
        return re.sub('|'.join(x['P_Name']), '**BLOCK**', x['Text'])
    else:
        return x['Text']

df['New'] = df.apply(replacing, axis=1)

The apply method applies the replacing function to each row, and substitution is done by the re.sub function.

0人赞添加讨论(0) 举报

modification of skipping empty list and continuing

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间