For each row in the input table, I need to generate multiple rows by separating the date range based on monthly. (please refer to the below sample output).
There is a simple iterative approach to convert row by row, but it is very slow on large dataframes.
Could anyone suggest a vectorized approach, such as using apply(), map() etc. to achieve the objective?
The output table is a new table.
Input:
ID, START_DATE, END_DATE
1, 2010-12-08, 2011-03-01
2, 2010-12-10, 2011-01-12
3, 2010-12-16, 2011-03-07
Output:
ID, START_DATE, END_DATE, NUMBER_DAYS, ACTION_DATE
1, 2010-12-08, 2010-12-31, 23, 201012
1, 2010-12-08, 2011-01-31, 54, 201101
1, 2010-12-08, 2011-02-28, 82, 201102
1, 2010-12-08, 2011-03-01, 83, 201103
2, 2010-12-10, 2010-12-31, 21, 201012
2, 2010-12-10, 2011-01-12, 33, 201101
3, 2010-12-16, 2010-12-31, 15, 201012
4, 2010-12-16, 2011-01-31, 46, 201101
5, 2010-12-16, 2011-02-28, 74, 201102
6, 2010-12-16, 2011-03-07, 81, 201103
I think you can use:
There is problem with last day of
Month
, becauseresample
add last day ofMonth
, so first createperiod
columns and thenmerge
them. Bycombine_first
add missing values from columnDate
and bybfill
add missing values of columnSTART_DATE
.Last add new columns by difference with
dt.days
anddt.strftime
:You can also try this. Using Pandas date_range function and DataFrame apply concept.
In your Ouptut, for the ID after 3, you have mentioned 4,5,6. I believe it should be 3. Please check.
Output