What I am trying to do: There is a large excel sheet with a lot haphazard customer information. I want to sort the email address and other data in a set format in a new excel file.
I can't quite figure out how to match the cell text(which will have some format like Address Email squished togethe and similar) with the regex and to keep only the regex data in a list.
Would really appreciate some help. Thanks
import sys, os, openpyxl
def sort_email_from_xl():
sheet = sheet_select() #Opens the worksheet
emailRegex = re.compile(r'''([a-zA-Z0-9._%+-]+@+[a-zA-Z0-9.-]+(\.[a-zA-Z]{2,4}))''',re.VERBOSE)
customeremails = []
for row in range(0, max_row):
if cell.text == emailRegex:
mail = cell.text
customeremails.append(mail)
return customeremails
print(customeremails)
This code should work (I could only test the regex part though):
There were many problems with your code. First about the regex:
.*?
at start and endre.VERBOSE
part as you'd only need it if you want to add inline comments to your regex, see doc@
in betweenNow, the email regex works for basic usage, but I'd definitively recommend to take a proven email regex from other answers on Stackoverflow.
Then: with
emailRegex.match(cell.text)
you can check if thecell.text
matches your regex and withemailRegex.match(cell.text).groups()[0]
you extract only the matching part. You had onereturn
statement too much as well.You need to correct the looping through the rows e.g. like documented here