problem with email parsing with python and multipl

2020-04-01 06:37发布

问题:

I am trying to parse emails with python email.parser. When my email contains multiple Received records, email.parser seems like ignoring those records.

Fore example, for input :

...
Received: from localhost (jalapeno [127.0.0.1])
    by jmason.org (Postfix) with ESMTP id 5C4E816F6D
    for <jm@localhost>; Sun,  6 Oct 2002 22:54:39 +0100 (IST)
Received: from jalapeno [127.0.0.1]
    by localhost with IMAP (fetchmail-5.9.0)
    for jm@localhost (single-drop); Sun, 06 Oct 2002 22:54:39 +0100 (IST)
...

the output is :

...
Received ::: from localhost (jalapeno [127.0.0.1])
    by jmason.org (Postfix) with ESMTP id 5C4E816F6D
    for <jm@localhost>; Sun,  6 Oct 2002 22:54:39 +0100 (IST)
Received ::: from localhost (jalapeno [127.0.0.1])
    by jmason.org (Postfix) with ESMTP id 5C4E816F6D
    for <jm@localhost>; Sun,  6 Oct 2002 22:54:39 +0100 (IST)
...

I am using the following python code

import email
f = open('email.txt', 'r')
data = f.read()
e = email.message_from_string(data)
for i in e.keys():
    print i, ':::', e[i]

Is this a bug of email.parser?

Do you suggest any other email parsing python library?

回答1:

The python doc for email.__getitem__() says:

Note that if the named field appears more than once in the message’s headers, exactly which of those field values will be returned is undefined. Use the get_all() method to get the values of all the extant named headers.

so, use e.get_all(i) instead of e[i] to get all values of the Received: header.