I am working with dataframe that contains column named "raw_parameter_name". In this column i have different string values. Several values are like following pattern "ABCD;MEAN". What i am trying to do is to replace each value "ABCD;MEAN" with "ABCD;X-BAR". Sub string "ABCD" may vary but pattern ";MEAN" is constant i want to replace. Looked into different options using "replace" method but don't know how to replace sub string only and not whole string. Please advise. Thank you in advance
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
You can use regex module re
for example:
import pandas as pd
import re
df = pd.DataFrame({"row_parameter_name":['abcd;MEAN','Dogg11;MEAN',';MEAN']})
Out[126]:
row_parameter_name
0 abcd;MEAN
1 Dogg11;MEAN
2 ;MEAN
df['row_parameter_name'] = df['row_parameter_name'].apply(lambda x: re.sub(';MEAN$',';X-BAR',x))
In [128]: df
Out[128]:
row_parameter_name
0 abcd;X-BAR
1 Dogg11;X-BAR
2 ;X-BAR
回答2:
use str.contains
to create a boolean index to mask the series and then str.replace
to replace your substring:
In [172]:
df = pd.DataFrame({'raw_parameter_name':['ABCD;MEAN', 'EFGH;MEAN', '1234;MEAN', 'sdasd;MEAT']})
df
Out[172]:
raw_parameter_name
0 ABCD;MEAN
1 EFGH;MEAN
2 1234;MEAN
3 sdasd;MEAT
In [173]:
df.loc[df['raw_parameter_name'].str.contains(';MEAN$'), 'raw_parameter_name'] = df['raw_parameter_name'].str.replace('MEAN', 'X-BAR')
df
Out[173]:
raw_parameter_name
0 ABCD;X-BAR
1 EFGH;X-BAR
2 1234;X-BAR
3 sdasd;MEAT
Here it matches where the substrin ';MEAN'
exists the $
is a terminating symbol
The boolean mask looks like the following:
In [176]:
df['raw_parameter_name'].str.contains(';MEAN$')
Out[176]:
0 True
1 True
2 True
3 False
Name: raw_parameter_name, dtype: bool
Timings
For a 40,0000 row df using str.replace
is faster than using apply
:
In [183]:
import re
%timeit df['raw_parameter_name'].apply(lambda x: re.sub(';MEAN$',';X-BAR',x))
%timeit df['raw_parameter_name'].str.replace('MEAN', 'X-BAR')
1 loops, best of 3: 1.01 s per loop
1 loops, best of 3: 687 ms per loop
回答3:
You do not have to use re
like in the example that was marked correct above. It may have been necessary at one point in time, but this is not the best answer to this anymore.
Nor do you need to use str.contains()
first.
Instead just use .str.replace()
with the appropriate match and replacement.
In [2]: df = pd.DataFrame({"row_parameter_name":['abcd;MEAN','Nothing;NICE','Dogg11;MEAN',';MEAN','MEANY-MEANY;MEAN']})
In [3]: df
Out[3]: row_parameter_name
0 abcd;MEAN
1 Nothing;NICE
2 Dogg11;MEAN
3 ;MEAN
4 MEANY-MEANY;MEAN
In [4]: df.row_parameter_name.str.replace("MEAN$","X-BAR")
Out[4]: 0 abcd;X-BAR
1 Nothing;NICE
2 Dogg11;X-BAR
3 ;X-BAR
4 MEANY-MEANY;X-BAR
Name: row_parameter_name, dtype: object