Python / Pandas / Numpy - Direct calculation of nu

2020-06-16 05:19发布

问题:

Is there a better / more direct way to calculate this than the following?

# 1. Set up the start and end date for which you want to calculate the      
# number of business days excluding holidays.

start_date = '01JAN1986'
end_date = '31DEC1987'
start_date = datetime.datetime.strptime(start_date, '%d%b%Y')
end_date = datetime.datetime.strptime(end_date, '%d%b%Y')

# 2. Generate a list of holidays over this period
from pandas.tseries.holiday import USFederalHolidayCalendar
calendar = USFederalHolidayCalendar()
holidays = calendar.holidays(start_date, end_date)
holidays

Which gives a pandas.tseries.index.DatetimeIndex

DatetimeIndex(['1986-01-01', '1986-01-20', '1986-02-17', '1986-05-26',
           '1986-07-04', '1986-09-01', '1986-10-13', '1986-11-11',
           '1986-11-27', '1986-12-25', '1987-01-01', '1987-01-19',
           '1987-02-16', '1987-05-25', '1987-07-03', '1987-09-07',
           '1987-10-12', '1987-11-11', '1987-11-26', '1987-12-25'],
          dtype='datetime64[ns]', freq=None, tz=None)

But you need a list for numpy busday_count

holiday_date_list = holidays.date.tolist()

Then with and without the holidays you get:

np.busday_count(start_date.date(), end_date.date()) 
>>> 521

np.busday_count(start_date.date(), end_date.date(), holidays = holiday_date_list)
>>> 501

There are some other questions slightly similar but generally working with pandas Series or Dataframes (Get business days between start and end date using pandas, Counting the business days between two series)

回答1:

Your answer's perfectly fine. This library looks very similar:

https://github.com/seatgeek/businesstime

, which does not necessitate pandas dataframes. (I saw this info here: Is there a function in Python/Pandas to get business time Delta between two date times?)



回答2:

If you put the index you created in a dataframe, you can use resample to fill in the gaps. The offset passed to .resample() can include things like business days and even (custom) calendars:

from pandas.tseries.holiday import USFederalHolidayCalendar

C = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())

start_date = '01JAN1986'
end_date = '31DEC1987'

(
pd.DataFrame(index=pd.to_datetime([start_date, end_date]))
    .resample(C, closed='right') 
    .asfreq()
    .index  
    .size
) - 1

The size of the index - 1 then gives us the amount of days.