Parsing date with timezone from an email?

2019-01-01 14:22发布

问题:

I am trying to retrieve date from an email. At first it\'s easy:

message = email.parser.Parser().parse(file)
date = message[\'Date\']
print date

and I receive:

\'Mon, 16 Nov 2009 13:32:02 +0100\'

But I need a nice datetime object, so I use:

datetime.strptime(\'Mon, 16 Nov 2009 13:32:02 +0100\', \'%a, %d %b %Y %H:%M:%S %Z\')

which raises ValueError, since %Z isn\'t format for +0100. But I can\'t find proper format for timezone in the documentation, there is only this %Z for zone. Can someone help me on that?

回答1:

email.utils has a parsedate() function for the RFC 2822 format, which as far as I know is not deprecated.

>>> import email.utils
>>> import time
>>> import datetime
>>> email.utils.parsedate(\'Mon, 16 Nov 2009 13:32:02 +0100\')
(2009, 11, 16, 13, 32, 2, 0, 1, -1)
>>> time.mktime((2009, 11, 16, 13, 32, 2, 0, 1, -1))
1258378322.0
>>> datetime.datetime.fromtimestamp(1258378322.0)
datetime.datetime(2009, 11, 16, 13, 32, 2)

Please note, however, that the parsedate method does not take into account the time zone and time.mktime always expects a local time tuple as mentioned here.

>>> (time.mktime(email.utils.parsedate(\'Mon, 16 Nov 2009 13:32:02 +0900\')) ==
... time.mktime(email.utils.parsedate(\'Mon, 16 Nov 2009 13:32:02 +0100\'))
True

So you\'ll still need to parse out the time zone and take into account the local time difference, too:

>>> REMOTE_TIME_ZONE_OFFSET = +9 * 60 * 60
>>> (time.mktime(email.utils.parsedate(\'Mon, 16 Nov 2009 13:32:02 +0900\')) +
... time.timezone - REMOTE_TIME_ZONE_OFFSET)
1258410122.0


回答2:

Use email.utils.parsedate_tz(date):

msg=email.message_from_file(open(file_name))
date=None
date_str=msg.get(\'date\')
if date_str:
    date_tuple=email.utils.parsedate_tz(date_str)
    if date_tuple:
        date=datetime.datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
if date:
    ... # valid date found


回答3:

In Python 3.3+, email message can parse the headers for you:

import email
import email.policy

headers = email.message_from_file(file, policy=email.policy.default)
print(headers.get(\'date\').datetime)
# -> 2009-11-16 13:32:02+01:00

Since Python 3.2+, it works if you replace %Z with %z:

>>> from datetime import datetime
>>> datetime.strptime(\"Mon, 16 Nov 2009 13:32:02 +0100\", 
...                   \"%a, %d %b %Y %H:%M:%S %z\")
datetime.datetime(2009, 11, 16, 13, 32, 2,
                  tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))

Or using email package (Python 3.3+):

>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime(\"Mon, 16 Nov 2009 13:32:02 +0100\")
datetime.datetime(2009, 11, 16, 13, 32, 2,
                  tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))

if UTC offset is specified as -0000 then it returns a naive datetime object that represents time in UTC otherwise it returns an aware datetime object with the corresponding tzinfo set.

To parse rfc 5322 date-time string on earlier Python versions (2.6+):

from calendar import timegm
from datetime import datetime, timedelta, tzinfo
from email.utils import parsedate_tz

ZERO = timedelta(0)
time_string = \'Mon, 16 Nov 2009 13:32:02 +0100\'
tt = parsedate_tz(time_string)
#NOTE: mktime_tz is broken on Python < 2.7.4,
#  see https://bugs.python.org/issue21267
timestamp = timegm(tt) - tt[9] # local time - utc offset == utc time
naive_utc_dt = datetime(1970, 1, 1) + timedelta(seconds=timestamp)
aware_utc_dt = naive_utc_dt.replace(tzinfo=FixedOffset(ZERO, \'UTC\'))
aware_dt = aware_utc_dt.astimezone(FixedOffset(timedelta(seconds=tt[9])))
print(aware_utc_dt)
print(aware_dt)
# -> 2009-11-16 12:32:02+00:00
# -> 2009-11-16 13:32:02+01:00

where FixedOffset is based on tzinfo subclass from the datetime documentation:

class FixedOffset(tzinfo):
    \"\"\"Fixed UTC offset: `time = utc_time + utc_offset`.\"\"\"
    def __init__(self, offset, name=None):
        self.__offset = offset
        if name is None:
            seconds = abs(offset).seconds
            assert abs(offset).days == 0
            hours, seconds = divmod(seconds, 3600)
            if offset < ZERO:
                hours = -hours
            minutes, seconds = divmod(seconds, 60)
            assert seconds == 0
            #NOTE: the last part is to remind about deprecated POSIX
            #  GMT+h timezones that have the opposite sign in the
            #  name; the corresponding numeric value is not used e.g.,
            #  no minutes
            self.__name = \'<%+03d%02d>GMT%+d\' % (hours, minutes, -hours)
        else:
            self.__name = name
    def utcoffset(self, dt=None):
        return self.__offset
    def tzname(self, dt=None):
        return self.__name
    def dst(self, dt=None):
        return ZERO
    def __repr__(self):
        return \'FixedOffset(%r, %r)\' % (self.utcoffset(), self.tzname())


回答4:

For python 3 you can use parsedate_to_datetime function:

>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime(\'Mon, 16 Nov 2009 13:32:02 +0100\')
...
datetime.datetime(2009, 11, 16, 13, 32, 2, tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))


回答5:

Have you tried

rfc822.parsedate_tz(date) # ?

More on RFC822, http://docs.python.org/library/rfc822.html

It\'s deprecated (parsedate_tz is now in email.utils.parsedate_tz), though.

But maybe these answers help:

  • How to parse dates with -0400 timezone string in python?

  • python time to age part 2, timezones



回答6:

# Parses Nginx\' format of \"01/Jan/1999:13:59:59 +0400\"
# Unfortunately, strptime doesn\'t support %z for the UTC offset (despite what
# the docs actually say), hence the need # for this function.
def parseDate(dateStr):
    date = datetime.datetime.strptime(dateStr[:-6], \"%d/%b/%Y:%H:%M:%S\")
    offsetDir = dateStr[-5]
    offsetHours = int(dateStr[-4:-2])
    offsetMins = int(dateStr[-2:])
    if offsetDir == \"-\":
        offsetHours = -offsetHours
        offsetMins = -offsetMins
    return date + datetime.timedelta(hours=offsetHours, minutes=offsetMins)


回答7:

For those who want to get the correct local time, here is what I did:

from datetime import datetime
from email.utils import parsedate_to_datetime

mail_time_str = \'Mon, 16 Nov 2009 13:32:02 +0100\'

local_time_str = datetime.fromtimestamp(parsedate_to_datetime(mail_time_str).timestamp()).strftime(\'%Y-%m-%d %H:%M:%S\')

print(local_time_str)


回答8:

ValueError: \'z\' is a bad directive in format...

(note: I have to stick to python 2.7 in my case)

I have had a similar problem parsing commit dates from the output of git log --date=iso8601 which actually isn\'t the ISO8601 format (hence the addition of --date=iso8601-strict in a later version).

Since I am using django I can leverage the utilities there.

https://github.com/django/django/blob/master/django/utils/dateparse.py

>>> from django.utils.dateparse import parse_datetime
>>> parse_datetime(\'2013-07-23T15:10:59.342107+01:00\')
datetime.datetime(2013, 7, 23, 15, 10, 59, 342107, tzinfo=+0100)

Instead of strptime you could use your own regular expression.