Parse human-format date ranges in Python

2020-03-14 03:23发布

问题:

I have some human-style date ranges, in strings, like the following:

22-24th April 2012
14-23 July
20th June - 5th July

I want to parse these in Python so that I can end up with two datetime objects: one for the start, one for the end.

Is there any module that will let me do this? I've tried parsedatetime, and it looks like the evalRange function within that may do it (see http://code-bear.com/code/parsedatetime/docs/index.html for documentation), but it doesn't seem to parse anything at all, and just returns the current date/time, twice.

Any ideas?

回答1:

I ended up writing a Python module to do this, which I have now open-sourced. It is available for download on Github, there is documentation, and it can be installed from PyPI using:

pip install daterangeparser

For those who are interested, the module works by creating a full parser using PyParsing, a great (and remarkably easy-to-use) tool.



回答2:

You could use dateutil.parser. But it does not handle date ranges. You may need to apply a regular expression before.

import dateutil.parser
dateutil.parser.parse("20th June")

returns datetime.datetime(2012, 6, 20, 0, 0)

Regards



回答3:

Based on previous answers, what you could do is:

  1. Preprocess your input so that you get the beginning and end date (for instance: 20th June and 5th July). In your first example (date_range == 22-24th July 2012) you can do that by using date_range.split(' ')[0].split('-'): this will return ['22', '24th'] (just drop the th and similar)
  2. Get datetime objects from those dates using dateutil.parser : dateutil.parser.parse('22 July 2012')

Here's an implementation of what was previously said:

import dateutil.parser
date_range = '20-22th July 2013'
date_range = date_range.lower()
for suffix in {'th', 'rd', 'st'}:
    date_range.replace(suffix, '')
days = date_range.split(' ')[0].split('-')
month_year = date_range.split(' ')[1]
begin, end = days[0] + ' ' + month_year, days[1] + ' ' + month_year
begin_date = dateutil.parser.parse(begin)
end_date = dateutil.parser.parse(end)