dateutils rrule returns dates that 2 months apart

2020-07-23 02:06发布

问题:

I am new to Python and also dateutil module. I am passing the following arguments:

disclosure_start_date = resultsDict['fd_disclosure_start_date']
disclosure_end_date = datetime.datetime.now()
disclosure_dates = [dt for dt in rrule(MONTHLY, dtstart=disclosure_start_date, until=disclosure_end_date)]

Here disclosure_start_date = 2012-10-31 00:00:00 which converted to datetime is datetime.datetime(2012, 10, 31, 0, 0)

End date is as of now.

When I use:

disclosure_dates = [dt for dt in rrule(MONTHLY, dtstart=disclosure_start_date, until=disclosure_end_date)]

I get the dates for every other month or 2 months apart. The result is:

>>> list(disclosure_dates)
[datetime.datetime(2012, 10, 31, 0, 0), 
 datetime.datetime(2012, 12, 31, 0, 0), 
 datetime.datetime(2013, 1, 31, 0, 0), 
 datetime.datetime(2013, 3, 31, 0, 0), 
 datetime.datetime(2013, 5, 31, 0, 0), 
 datetime.datetime(2013, 7, 31, 0, 0), 
 datetime.datetime(2013, 8, 31, 0, 0), 
 datetime.datetime(2013, 10, 31, 0, 0), 
 datetime.datetime(2013, 12, 31, 0, 0), 
 datetime.datetime(2014, 1, 31, 0, 0), 
 datetime.datetime(2014, 3, 31, 0, 0), 
 datetime.datetime(2014, 5, 31, 0, 0), 
 datetime.datetime(2014, 7, 31, 0, 0), 
 datetime.datetime(2014, 8, 31, 0, 0), 
 datetime.datetime(2014, 10, 31, 0, 0), 
 datetime.datetime(2014, 12, 31, 0, 0), 
 datetime.datetime(2015, 1, 31, 0, 0), 
 datetime.datetime(2015, 3, 31, 0, 0), 
 datetime.datetime(2015, 5, 31, 0, 0), 
 datetime.datetime(2015, 7, 31, 0, 0), 
 datetime.datetime(2015, 8, 31, 0, 0), 
 datetime.datetime(2015, 10, 31, 0, 0), 
 datetime.datetime(2015, 12, 31, 0, 0), 
 datetime.datetime(2016, 1, 31, 0, 0), 
 datetime.datetime(2016, 3, 31, 0, 0), 
 datetime.datetime(2016, 5, 31, 0, 0)]

I am not sure what I am doing wrong. Can someone please point out the mistake here?

回答1:

The issue you are coming up against comes from the fact that datetime.datetime(2012, 10, 31, 0, 0) is the 31st of the month, and not all months have a 31st. Since the rrule module is an implementation of RFC 2445. Per RFC 3.3.10:

Recurrence rules may generate recurrence instances with an invalid date (e.g., February 30) or nonexistent local time (e.g., 1:30 AM on a day where the local time is moved forward by an hour at 1:00 AM). Such recurrence instances MUST be ignored and MUST NOT be counted as part of the recurrence set.

Since you have a monthly rule that generates the 31st of a month, it will skip all months with 30 or fewer days. You can see this bug report in dateutil about this issue.

If you just want the last day of the month, you should use the bymonthday=-1 argument:

from dateutil.rrule import rrule, MONTHLY
from datetime import datetime

disclosure_start_date = datetime(2012, 10, 31, 0, 0)

rr = rrule(freq=MONTHLY, dtstart=disclosure_start_date, bymonthday=-1)
# >>>rr.between(datetime(2013, 1, 1), datetime(2013, 5, 1))
# [datetime.datetime(2013, 1, 31, 0, 0),
#  datetime.datetime(2013, 2, 28, 0, 0),
#  datetime.datetime(2013, 3, 31, 0, 0),
#  datetime.datetime(2013, 4, 30, 0, 0)]

Unfortunately, I don't think there's an RFC-compliant way to generate a simple RRULE that just falls back to the end of the month if-and-only-if it's necessary (e.g. what do you do with January 30th - you need fallback for February, but you don't want to use bymonthday=-2 because that will give you Feb. 27th, etc).

Alternatively, for a simple monthly rule like this, a better option is probably to just use relativedelta, which does fall back to the end of the month:

from dateutil.relativedelta import relativedelta
from datetime import datetime

def disclosure_dates(dtstart, rd, dtend=None):
    ii = 0
    while True:
        cdate = dtstart + ii*rd
        ii += 1

        yield cdate
        if dtend is not None and cdate >= dtend:
            break


dtstart = datetime(2013, 1, 31, 0, 0)
rd = relativedelta(months=1)
rr = disclosure_dates(dtstart, rd, dtend=datetime(2013, 5, 1))

# >>> list(rr)
# [datetime.datetime(2013, 1, 31, 0, 0),
#  datetime.datetime(2013, 2, 28, 0, 0),
#  datetime.datetime(2013, 3, 31, 0, 0),
#  datetime.datetime(2013, 4, 30, 0, 0),
#  datetime.datetime(2013, 5, 31, 0, 0)]

Note that I specifically used cdate = dtstart + ii * rd, you do not want to just keep a "running tally", as that will pin to the shortest month the tally has seen:

dt_base = datetime(2013, 1, 31)
dt = dt_base
for ii in range(5):
    cdt = dt_base + ii*rd
    print('{} | {}'.format(dt, cdt))
    dt += rd

Result:

2013-01-31 00:00:00 | 2013-01-31 00:00:00
2013-02-28 00:00:00 | 2013-02-28 00:00:00
2013-03-28 00:00:00 | 2013-03-31 00:00:00
2013-04-28 00:00:00 | 2013-04-30 00:00:00
2013-05-28 00:00:00 | 2013-05-31 00:00:00