Grouping consecutive dates together

2019-07-04 00:35发布

问题:

I have a list of (many) employees in Excel/csv who take sick days, listed in the following format. Each sick day instance gets it's own line. I want to add another column 'Result', which records the length of the sick-period. For example, Mon-Tues-Wed means each of these three entries get labelled with a 3.

I am new to python, and I am wondering if this approach is ideal, thoughI cant see how SQL would be any easier, other than to create tables for each individual employee (easy) and then run analysis on that (hard)

My goal is to be able to seperate 1-day long periods from 10+ day periods. Bonus points for this spanning over weekends.

Person    Date       Result

A       02/04/2012     5

B       02/04/2012     2

A       03/04/2012     5

B       03/04/2012     2

A       04/04/2012     5

A       05/04/2012     5

A       06/04/2012     5

B       25/04/2012     1

A       25/04/2012     2

A       26/04/2012     2

B       30/04/2012     1

回答1:

def group(iterable):
    myIter = iter(iterable)

    run = [next(myIter)]
    def continuesRun(x):
        return run[-1]==x-1

    for x in myIter:
        if continuesRun(x):
            run.append(x)
        else:
            yield run
            run = [x]
    yield run

Demo:

>>> list( group([1,10,11,12,20,21]) )
[[1], [10, 11, 12], [20, 21]]

To apply this to your situation, define the function continuesRun like so, in pseudocode:

def continuesRun(date):
    previousDate = run[-1]
    return previousDate==date-1day or (previousDate.weekday==Friday and previousDate==date-3day)

sidenote: It seems slightly morally/pragmatically wrong, in my personal opinion, to count sickdays spans bordering weekends as potentially 2 or 4 days longer. But if you have good reason to do so, who am I to judge. =) To count those, post-process your runs: add 2 if the first day was Monday, and add 2 if the last day was Friday, then add len(d for d in range(run[-1]-run[0]) if (run[0]+d*day).isWeekend()). Of course this does not count holidays, in which case you would do .isHoliday() or .isWeekend() and make the "add 2" logic exactly like the len(...) logic, by iterating back until you find a non-holiday, and penalizing the person for each holiday-or-weekend thus adjacent to the run.