Method for parsing text Cc field of email header?

2019-04-08 15:27发布

I have the plain text of a Cc header field that looks like so:

friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>

Are there any battle tested modules for parsing this properly?

(bonus if it's in python! the email module just returns the raw text without any methods for splitting it, AFAIK) (also bonus if it splits name and address into to fields)

4条回答
可以哭但决不认输i
2楼-- · 2019-04-08 15:43

I haven't used it myself, but it looks to me like you could use the csv package quite easily to parse the data.

查看更多
爱情/是我丢掉的垃圾
3楼-- · 2019-04-08 15:49

There are a bunch of function available as a standard python module, but I think you're looking for email.utils.parseaddr() or email.utils.getaddresses()

>>> addresses = 'friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>'
>>> email.utils.getaddresses([addresses])
[('', 'friend@email.com'), ('John Smith', 'john.smith@email.com'), ('Smith, Jane', 'jane.smith@uconn.edu')]
查看更多
太酷不给撩
4楼-- · 2019-04-08 15:52

Convert multiple E-mail string in to dictionary (Multiple E-Mail with name in to one string).

emailstring = 'Friends <friend@email.com>, John Smith <john.smith@email.com>,"Smith" <jane.smith@uconn.edu>'

Split string by Comma

email_list = emailstring.split(',')

name is key and email is value and make dictionary.

email_dict = dict(map(lambda x: email.utils.parseaddr(x), email_list))

Result like this:

{'John Smith': 'john.smith@email.com', 'Friends': 'friend@email.com', 'Smith': 'jane.smith@uconn.edu'}

Note:

If there is same name with different email id then one record is skip.

'Friends <friend@email.com>, John Smith <john.smith@email.com>,"Smith" <jane.smith@uconn.edu>, Friends <friend_co@email.com>'

"Friends" is duplicate 2 time.

查看更多
来,给爷笑一个
5楼-- · 2019-04-08 16:04

The bellow is completely unnecessary. I wrote it before realising that you could pass getaddresses() a list containing a single string containing multiple addresses.

I haven't had a chance to look at the specifications for addresses in email headers, but based on the string you provided, this code should do the job splitting it into a list, making sure to ignore commas if they are within quotes (and therefore part of a name).

from email.utils import getaddresses

addrstring = ',friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>,'

def addrparser(addrstring):
    addrlist = ['']
    quoted = False

    # ignore comma at beginning or end
    addrstring = addrstring.strip(',')

    for char in addrstring:
        if char == '"':
            # toggle quoted mode
            quoted = not quoted
            addrlist[-1] += char
        # a comma outside of quotes means a new address
        elif char == ',' and not quoted:
            addrlist.append('')
        # anything else is the next letter of the current address
        else:
            addrlist[-1] += char

    return getaddresses(addrlist)

print addrparser(addrstring)

Gives:

[('', 'friend@email.com'), ('John Smith', 'john.smith@email.com'),
 ('Smith, Jane', 'jane.smith@uconn.edu')]

I'd be interested to see how other people would go about this problem!

查看更多
登录 后发表回答