How to use regex to update dataframe using python

2019-07-07 08:11发布

问题:

I have 2 issues, First one, that I want to update some values in xlsm file depends on python object, as this object is just a extracted values from specific column in csv file, as to update values in the xlsm file dataframe through this exacted object index then I create a conditions contains loc str.contains

So now let's go in deep

Here's in this code I am just reading the csv and xlsm file

#This Gov_Tracker master sheet dataframe
gov_tracker_sheet =pd.read_excel(r'' + mydir + 'Governance_Tracker - Copy - 
Copy.xlsm', encoding='latin-1', sheet_name='Gov_Tracker',
                             header = 1)

myOutlook_inBox = pd.read_csv(r'' + mydir + 'test.CSV', usecols=['Subject', 
'Body', 'From: (Name)', 'To: (Name)'],
                          encoding='latin-1')

This object extract 5 chars and 5 numbers from specific column in csv

replaced_sbj_value = myOutlook_inBox['Subject']
.str.extract(pat='(L(?:DEL|CAI|SIN).\d{5})').dropna()

This column I want to filter in the xlsm

myOutlook_inBox["Subject"] = replaced_sbj_value

This is one of my simple condition I am working with

frm_mwfy_to_te = myOutlook_inBox.loc[myOutlook_inBox['From: 
(Name)'].str.contains("mowafy", na=False) \
                                 & myOutlook_inBox['To: 
(Name)'].str.contains("te", na=False)] \
                 .drop_duplicates(keep=False)

This object is just filter in gov_tracker_sheet Where frm_mwfy_to_te.Subject using join and str.contains

filtered_data_cond1 = gov_tracker_sheet.loc[gov_tracker_sheet['Site 
Name'].str.contains('|'.join(frm_mwfy_to_te.Subject))]

After that I created a tuple object and put in it the filtered_data_cond1['SiteCode']

values = tuple(filtered_data_cond1['SiteCode'].values.tolist())

Finally I used regex function as to replace the values I need according to the condition I created

updated_gov_tracker = gov_tracker_sheet.replace(to_replace = 
gov_tracker_sheet['Pending  '].values, value = 'TETE', regex=True)\
.where((values))

After I run project I find this error in the RUN

Traceback (most recent call last):
File "C:/Users/DELL/PycharmProjects/GovTrackerExcelTool
/GovTrackerUpdatingScript.py", 
line 55, in <module>
updated_gov_tracker = gov_tracker_sheet.replace(to_replace = 
gov_tracker_sheet['Pending  '].values, value = 'TETE', 
regex=True).where((values))
File "C:\Users\DELL\PycharmProjects\GovTrackerExcelTool\venv\lib\site- 
packages\pandas\core\frame.py", line 4042, in replace
method=method)
File "C:\Users\DELL\PycharmProjects\GovTrackerExcelTool\venv\lib\site- 
packages\pandas\core\generic.py", line 6552, in replace
regex=regex)
File "C:\Users\DELL\PycharmProjects\GovTrackerExcelTool\venv\lib\site- 
packages\pandas\core\internals\managers.py", line 537, in replace
return self.apply('replace', **kwargs)
File "C:\Users\DELL\PycharmProjects\GovTrackerExcelTool\venv\lib\site- 
packages\pandas\core\internals\managers.py", line 395, in apply
applied = getattr(b, f)(**kwargs)
File "C:\Users\DELL\PycharmProjects\GovTrackerExcelTool\venv\lib\site- 
packages\pandas\core\internals\blocks.py", line 2781, in replace
convert=convert)
File "C:\Users\DELL\PycharmProjects\GovTrackerExcelTool\venv\lib\site- 
packages\pandas\core\internals\blocks.py", line 2880, in _replace_single
new_values[filt] = f(new_values[filt])
File "C:\Users\DELL\PycharmProjects\GovTrackerExcelTool\venv\lib\site- 
packages\numpy\lib\function_base.py", line 2091, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\Users\DELL\PycharmProjects\GovTrackerExcelTool\venv\lib\site- 
packages\numpy\lib\function_base.py", line 2167, in _vectorize_call
outputs = ufunc(*inputs)
File "C:\Users\DELL\PycharmProjects\GovTrackerExcelTool\venv\lib\site- 
packages\pandas\core\internals\blocks.py", line 2868, in re_replacer
return rx.sub(value, s)
MemoryError

This is my first issue

So my second issue was, after I update this dataframe gov_tracker_sheet I want to replace the old sheet in the xlsm with the new updated dataframe

I hope this could be clear enough