Problem: Polluted Dataframe.
Details: Frame consists of NaNs string values which i know the meaning of and numeric values.
Task: Replaceing the numeric values with NaNs
Example
import numpy as np
import pandas as pd
df = pd.DataFrame([['abc', 'cdf', 1], ['k', 'sum', 'some'], [1000, np.nan, 'nothing']])
out:
0 1 2
0 abc cdf 1
1 k sum some
2 1000 NaN nothing
Attempt 1 (Does not work, because regex only looks at string cells)
df.replace({'\d+': np.nan}, regex=True)
out:
0 1 2
0 abc cdf 1
1 k sum some
2 1000 NaN nothing
Preliminary Solution
val_set = set()
[val_set.update(i) for i in df.values]
def dis_nums(myset):
str_s = set()
num_replace_dict = {}
for i in range(len(myset)):
val = myset.pop()
if type(val) == str:
str_s.update([val])
else:
num_replace_dict.update({val:np.nan})
return str_s, num_replace_dict
strs, rpl_dict = dis_nums(val_set)
df.replace(rpl_dict, inplace=True)
out:
0 1 2
0 abc cdf NaN
1 k sum some
2 NaN NaN nothing
Question Is there any easier/ more pleasant solution?