-->

Algorithm to combine / merge date ranges

2019-02-14 14:31发布

问题:

I am trying to find the best way on how to merge date ranges into one database record (array element).

This is the data I have:

  Array
(
    [0] => Array
        (
            [id] => 18298
            [start_date] => 2011-07-09
            [end_date] => 2011-10-01
        )

    [1] => Array
        (
            [id] => 18297
            [start_date] => 2011-06-01
            [end_date] => 2011-06-30
        )

    [2] => Array
        (
            [id] => 17113
            [start_date] => 2011-03-31
            [end_date] => 2011-05-31
        )

    [3] => Array
        (
            [id] => 20555
            [start_date] => 2011-01-03
            [end_date] => 2011-03-31
        )
)

And after we combine them, array (or database) should look like this:

Array
(
    [0] => Array
        (
            [merged_ids] => 18298
            [start_date] => 2011-07-09
            [end_date] => 2011-10-01
        )

    [1] => Array
        (
            [merged_ids] => 18297, 17113, 20555
            [start_date] => 2011-01-03
            [end_date] => 2011-06-30
        )
)

Is there any algorithm to go through all elements/ranges and combine them? Which way is better/easier to do - through database (MYSQL) or coding (PHP)?

Any advise is highly appreciated.

Thanks!

UPDATE: Sorry, I didn't provide enough info: we should merge any continuous and overlapping date ranges.

回答1:

Sort by start date.

Then iterate through and check for if the next item's start date is before or directly after the current one's end date. If it is, then merge the next one into the current one. Then continue.



回答2:

I've written function which combines/merges list of ranges. It is written in Python, but it should be easy to rewrite it in PHP. Here's full code: https://gist.github.com/barszczmm/8447665 and here's simplified algorithm (still in Python):

list_of_ranges.sort() # sort input list
new_list_of_ranges = [] # output list

new_range_item_start = None
new_range_item_end = None

length = len(list_of_ranges)
for i, range_item in enumerate(list_of_ranges):
    if new_range_item_start is None:
        new_range_item_start = range_item[0]
        new_range_item_end = range_item[1]
    elif new_range_item_end >= range_item[0]:
        new_range_item_end = max(range_item[1], new_range_item_end)
    else:
        new_list_of_ranges.append((new_range_item_start, new_range_item_end))
        new_range_item_start = range_item[0]
        new_range_item_end = range_item[1]
    # save if this is last item
    if i + 1 == length:
        new_list_of_ranges.append((new_range_item_start, new_range_item_end))


回答3:

The implementation is like:

function mergeDateTimeRanges($ranges)
{
    $retVal = [];
    //sort date ranges by begin time
    usort($ranges, function ($a, $b) {
        return strcmp($a['begin_at'], $b['begin_at']);
    });

    $currentRange = [];
    foreach ($ranges as $range) {
        // bypass invalid value
        if ($range['begin_at'] >= $range['end_at']) {
            continue;
        }
        //fill in the first element
        if (empty($currentRange)) {
            $currentRange = $range;
            continue;
        }

        if ($currentRange['end_at'] < $range['begin_at']) {
            $retVal[] = $currentRange;
            $currentRange = $range;
        } elseif ($currentRange['end_at'] < $range['end_at']) {
            $currentRange ['end_at'] = $range['end_at'];
        }
    }

    if ($currentRange) {
        $retVal[] = $currentRange;
    }

    return $retVal;
}


回答4:

My method will generate a merged array where by overlapping or consecutive dates are grouped together and the group's ids are stored as a string of comma-space separated values.

I am using the modern "spaceship operator" (<=>) for usort()s comparison. If your code is running on a version below php7, you can use:

usort($array,function($a,$b){ return strcmp($a['start_date'],$b['start_date']); });

See inline comments for step-by-step explanation.

Code: (Demo)

function mergeRanges($array){
    usort($array,function($a,$b){ return $a['start_date']<=>$b['start_date']; });  // order by start_date ASC

    foreach($array as $i=>$row){
         if($i && $row['start_date']<=date('Y-m-d',strtotime("{$result[$x]['end_date']} +1 day"))){  // not the first iteration and dates are within current group's range
            if($row['end_date']>$result[$x]['end_date']){  // only if current end_date is greater than existing end_date
                $result[$x]['end_date']=$row['end_date'];  // overwrite end_date with new end_date in group
            }
            $result[$x]['merged_ids'][]=$row['id'];  // append id to merged_ids subarray
        }else{  // first iteration or out of range; start new group
            if($i){  // if not first iteration
                $result[$x]['merged_ids']=implode(', ',$result[$x]['merged_ids']);  // convert previous group's id elements to csv string
            }else{  // first iteration
                $x=-1;  // declare $x as -1 so that it becomes 0 when incremented with ++$x
            }
            $result[++$x]=['merged_ids'=>[$row['id']],'start_date'=>$row['start_date'],'end_date'=>$row['end_date']]; // declare new group
        }
    }
    $result[$x]['merged_ids']=implode(', ',$result[$x]['merged_ids']);  // convert final merged_ids subarray to csv string
    return $result;
}

$array=[
    ['id'=>18298,'start_date'=>'2011-07-09','end_date'=>'2011-10-01'],
    ['id'=>18297,'start_date'=>'2011-06-01','end_date'=>'2011-06-30'],
    ['id'=>17113,'start_date'=>'2011-03-31','end_date'=>'2011-05-31'],  // tests that 17113 and 18297 belong in same group
    ['id'=>20556,'start_date'=>'2011-02-03','end_date'=>'2011-02-13'],  // tests that "fully overlapped" date range is included
    ['id'=>20555,'start_date'=>'2011-01-03','end_date'=>'2011-03-31']
];

print_r(mergeRanges($array));

Output:

Array
(
    [0] => Array
        (
            [merged_ids] => 20555, 20556, 17113, 18297
            [start_date] => 2011-01-03
            [end_date] => 2011-06-30
        )

    [1] => Array
        (
            [merged_ids] => 18298
            [start_date] => 2011-07-09
            [end_date] => 2011-10-01
        )

)