Method for parsing text Cc field of email header?

I have the plain text of a Cc header field that looks like so:

friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>

Are there any battle tested modules for parsing this properly?

(bonus if it's in python! the email module just returns the raw text without any methods for splitting it, AFAIK) (also bonus if it splits name and address into to fields)

标签： python parsing email email-headers

4条回答

可以哭但决不认输i

2楼-- · 2019-04-08 15:43

I haven't used it myself, but it looks to me like you could use the csv package quite easily to parse the data.

0人赞添加讨论(0) 举报

爱情/是我丢掉的垃圾

3楼-- · 2019-04-08 15:49

There are a bunch of function available as a standard python module, but I think you're looking for email.utils.parseaddr() or email.utils.getaddresses()

>>> addresses = 'friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>'
>>> email.utils.getaddresses([addresses])
[('', 'friend@email.com'), ('John Smith', 'john.smith@email.com'), ('Smith, Jane', 'jane.smith@uconn.edu')]

0人赞添加讨论(0) 举报

太酷不给撩

4楼-- · 2019-04-08 15:52

Convert multiple E-mail string in to dictionary (Multiple E-Mail with name in to one string).

emailstring = 'Friends <friend@email.com>, John Smith <john.smith@email.com>,"Smith" <jane.smith@uconn.edu>'

Split string by Comma

email_list = emailstring.split(',')

name is key and email is value and make dictionary.

email_dict = dict(map(lambda x: email.utils.parseaddr(x), email_list))

Result like this:

{'John Smith': 'john.smith@email.com', 'Friends': 'friend@email.com', 'Smith': 'jane.smith@uconn.edu'}

Note:

If there is same name with different email id then one record is skip.

'Friends <friend@email.com>, John Smith <john.smith@email.com>,"Smith" <jane.smith@uconn.edu>, Friends <friend_co@email.com>'

"Friends" is duplicate 2 time.

0人赞添加讨论(0) 举报

来，给爷笑一个

5楼-- · 2019-04-08 16:04

The bellow is completely unnecessary. I wrote it before realising that you could pass getaddresses() a list containing a single string containing multiple addresses.

I haven't had a chance to look at the specifications for addresses in email headers, but based on the string you provided, this code should do the job splitting it into a list, making sure to ignore commas if they are within quotes (and therefore part of a name).

from email.utils import getaddresses

addrstring = ',friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>,'

def addrparser(addrstring):
    addrlist = ['']
    quoted = False

    # ignore comma at beginning or end
    addrstring = addrstring.strip(',')

    for char in addrstring:
        if char == '"':
            # toggle quoted mode
            quoted = not quoted
            addrlist[-1] += char
        # a comma outside of quotes means a new address
        elif char == ',' and not quoted:
            addrlist.append('')
        # anything else is the next letter of the current address
        else:
            addrlist[-1] += char

    return getaddresses(addrlist)

print addrparser(addrstring)

Gives:

[('', 'friend@email.com'), ('John Smith', 'john.smith@email.com'),
 ('Smith, Jane', 'jane.smith@uconn.edu')]

I'd be interested to see how other people would go about this problem!

0人赞添加讨论(0) 举报

Method for parsing text Cc field of email header?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间