I am running the following function but somehow struggling to have it take the length condition into account (the if part). It simply runs the first part if the function only:
stringDataFrame.apply(lambda x: x.str.replace(r'[^0-9]', '') if (len(x) >= 7) else x)
it somehow only runs the x.str.replace(r'[^0-9]', '')
part for some reason, what am I doing wrong here i have been stuck.
You can use applymap
when you need to work on each value separately, because apply
works with all column
(Series
).
Then instead of using str.replace
, use re.sub
which works nicer for regexs:
print (stringDataFrame.applymap(lambda x: re.sub(r'[^0-9]', '', x) if (len(x) >= 7) else x))
Sample:
import pandas as pd
import re
stringDataFrame = pd.DataFrame({'A':['gdgdg454dgd','147ooo2', '123ss45678'],
'B':['gdgdg454dgd','x142', '12345678a'],
'C':['gdgdg454dgd','xx142', '12567dd8']})
print (stringDataFrame)
A B C
0 gdgdg454dgd gdgdg454dgd gdgdg454dgd
1 147ooo2 x142 xx142
2 123ss45678 12345678a 12567dd8
print (stringDataFrame.applymap(lambda x: re.sub(r'[^0-9]', '', x) if (len(x) >= 7) else x))
A B C
0 454 454 454
1 1472 x142 xx142
2 12345678 12345678 125678