I have data on spells (hospital stays), each with a start and end date, but I want to count the number of days spent in hospital for calendar months. Of course, this number can be zero for months not appearing in a spell. But I cannot just attribute the length of each spell to the starting month, as longer spells run over to the following month (or more).
Basically, it would suffice for me if I could cut spells at turn-of-month datetimes, getting from the data in the first example to the data in the second:
id start end
1 2011-01-01 10:00:00 2011-01-08 16:03:00
2 2011-01-28 03:45:00 2011-02-04 15:22:00
3 2011-03-02 11:04:00 2011-03-05 05:24:00
id start end month stay
1 2011-01-01 10:00:00 2011-01-08 16:03:00 2011-01 7
2 2011-01-28 03:45:00 2011-01-31 23:59:59 2011-01 4
2 2011-02-01 00:00:00 2011-02-04 15:22:00 2011-02 4
3 2011-03-02 11:04:00 2011-03-05 05:24:00 2011-03 3
I read up on the Time Series / Date functionality of pandas, but I do not see a straightforward solution to this. How can one accomplish the slicing?