How do I parse an HTTP date-string in Python?

2019-01-21 16:55发布

问题:

Is there an easy way to parse HTTP date-strings in Python? According to the standard, there are several ways to format HTTP date strings; the method should be able to handle this.

In other words, I want to convert a string like "Wed, 23 Sep 2009 22:15:29 GMT" to a python time-structure.

回答1:

>>> import email.utils as eut
>>> eut.parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, -1)

If you want a datetime.datetime object, you can do:

def my_parsedate(text):
    return datetime.datetime(*eut.parsedate(text)[:6])


回答2:

>>> import datetime
>>> datetime.datetime.strptime('Wed, 23 Sep 2009 22:15:29 GMT', '%a, %d %b %Y %H:%M:%S GMT')
datetime.datetime(2009, 9, 23, 22, 15, 29)


回答3:

httplib.HTTPMessage(filehandle).getdate(headername)
httplib.HTTPMessage(filehandle).getdate_tz(headername)
mimetools.Message(filehandle).getdate()
rfc822.parsedate(datestr)
rfc822.parsedate_tz(datestr)
  • if you have a raw data stream, you can build an HTTPMessage or a mimetools.Message from it. it may offer additional help while querying the response object for infos
  • if you are using urllib2, you already have an HTTPMessage object hidden in the filehandler returned by urlopen
  • it can probably parse many date formats
  • httplib is in the core

NOTE:

  • had a look at implementation, HTTPMessage inherits from mimetools.Message which inherits from rfc822.Message. two floating defs are of your interest maybe, parsedate and parsedate_tz (in the latter)
  • parsedate(_tz) from email.utils has a different implementation, although it looks kind of the same.

you can do this, if you only have that piece of string and you want to parse it:

>>> from rfc822 import parsedate, parsedate_tz
>>> parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
>>> 

but let me exemplify through mime messages:

import mimetools
import StringIO
message = mimetools.Message(
    StringIO.StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
>>> m
<mimetools.Message instance at 0x7fc259146710>
>>> m.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)

or via http messages (responses)

>>> from httplib import HTTPMessage
>>> from StringIO import StringIO
>>> http_response = HTTPMessage(StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
>>> #http_response can be grabbed via urllib2.urlopen(url).info(), right?
>>> http_response.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)

right?

>>> import urllib2
>>> urllib2.urlopen('https://fw.io/').info().getdate('Date')
(2014, 2, 19, 18, 53, 26, 0, 1, 0)

there, now we now more about date formats, mime messages, mime tools and their pythonic implementation ;-)

whatever the case, looks better than using email.utils for parsing http headers.