I have the plain text of a Cc header field that looks like so:
friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>
Are there any battle tested modules for parsing this properly?
(bonus if it's in python! the email module just returns the raw text without any methods for splitting it, AFAIK) (also bonus if it splits name and address into to fields)
I haven't used it myself, but it looks to me like you could use the csv package quite easily to parse the data.
There are a bunch of function available as a standard python module, but I think you're looking for email.utils.parseaddr() or email.utils.getaddresses()
Convert multiple E-mail string in to dictionary (Multiple E-Mail with name in to one string).
Split string by Comma
email_list = emailstring.split(',')
name is key and email is value and make dictionary.
Result like this:
Note:
If there is same name with different email id then one record is skip.
"Friends" is duplicate 2 time.
The bellow is completely unnecessary. I wrote it before realising that you could pass
getaddresses()
a list containing a single string containing multiple addresses.I haven't had a chance to look at the specifications for addresses in email headers, but based on the string you provided, this code should do the job splitting it into a list, making sure to ignore commas if they are within quotes (and therefore part of a name).
Gives:
I'd be interested to see how other people would go about this problem!