cut time spells into calendar months in pandas

2019-08-04 14:01发布

I have data on spells (hospital stays), each with a start and end date, but I want to count the number of days spent in hospital for calendar months. Of course, this number can be zero for months not appearing in a spell. But I cannot just attribute the length of each spell to the starting month, as longer spells run over to the following month (or more).

Basically, it would suffice for me if I could cut spells at turn-of-month datetimes, getting from the data in the first example to the data in the second:

id                    start                     end
 1      2011-01-01 10:00:00     2011-01-08 16:03:00
 2      2011-01-28 03:45:00     2011-02-04 15:22:00
 3      2011-03-02 11:04:00     2011-03-05 05:24:00

id                    start                     end     month      stay
 1      2011-01-01 10:00:00     2011-01-08 16:03:00   2011-01         7
 2      2011-01-28 03:45:00     2011-01-31 23:59:59   2011-01         4
 2      2011-02-01 00:00:00     2011-02-04 15:22:00   2011-02         4
 3      2011-03-02 11:04:00     2011-03-05 05:24:00   2011-03         3

I read up on the Time Series / Date functionality of pandas, but I do not see a straightforward solution to this. How can one accomplish the slicing?

1条回答
Root(大扎)
2楼-- · 2019-08-04 14:26

It's simpler than you think: just subtract the dates. The result is a time span. See Add column with number of days between dates in DataFrame pandas

You even get to do this for the entire frame at once: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.subtract.html


Update, now that I understand the problem better. Add a new column: take the spell's end date; if the start date is in a different month, then set this new date's day to 01 and the time to 00:00.

This is the cut DateTime you can use to compute the portion of the stay attributable to each month. cut - start is the first month; end - cut is the second.

查看更多
登录 后发表回答