Algorithm to merge multiple sorted sequences into

2020-02-13 12:04发布

站内文章 / C++

90 0

一夜七次

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am looking for an algorithm to merge multiple sorted sequences, lets say X sorted sequences with n elements, into one sorted sequence in c++ , can you provide some examples?

note: I do not want to use any library

回答1:

There are three methods that do the merging :-

Suppose you are merging m lists with n elements each

Algorithm 1 :-

Merge lists two at a time. Use merge sort like merge routine to merge as the lists are sorted. This is very simple to implement without any libraries. But takes time O(m^2*n) which is small enough if m is not large.

Algorithm 2:-

This is an improvement over 1. where we always merge list which are the smallest two in the remaining list. Use a priority queue to do that and select smallest two list and merge them and add new list to queue. Do this till only 1 list is left which would be your answer. This technique is used in huffman coding and produces optimal merge pattern. This takes O(m*n*logm). Moreover for similar sized lists it can be made parallel as we can select a pair of list and merge in parallel. Assuming you have m processors then the algorithm can ideally run in O(n*logm) instead of O(m*n*logm)

Algorithm 3:-

This is most efficient algorithm where you maintain a priority queue for first elements of all lists and extract min to get new element also maintain index of the list min element belongs to so that you can add the next element from that list. This take O(s*logm) where s is total elements in all lists.

回答2:

Assumptions

The following method works with any container like array, vector, list etc. I'm assuming that we are working with lists.

Let's assume that we have m sorted lists which we want to merge.

Let n denotes the total number of elements in all lists.

Idea

The first element in the resulting list has to be the smallest element in the set of all heads of the lists.

The idea is quite simple. Just select the smallest head and move it from the original list to the result. You want to repeat that routine while there is at least one non empty list. The crucial thing here is to select the smallest head fast.

If m is small

A linear scan through the heads is O(m) resulting in O(m * n) total time which is fine if m is a small constant.

If m is not so small

Then we can do better by using a priority queue, for example a heap. The invariant here is that the smallest element in the heap is always the smallest element from current heads.

Finding the minimum element is a heap is O(1), deleting the minimum is O(log m) if there are m elements in the heap, and inserting an element into the heap is also O(log m).

In summary, for each of n elements, we insert it into the heap once and delete it from there also once. The total complexity with a heap is O(n log m) which is significantly faster that O(n * m) if m is not a small constant.

Summary

Which method is faster depends on how many lists we want to merge. If m is small pick the linear scan, in the other case implement it with a priority queue. Sometimes it's hard to judge if the m is small or not and in that case some experiments will be helpful.

回答3:

I assume that without libraries to the merger. Otherwise, you have to write an own linked list (this may be forward, or normal list). Rest the same. Easy example (for two lists):

#include <list>
#include <iostream>

using namespace std;

int main(void)
 {
  list<int> a = { 1, 3, 5, 7, 9}, b = { 2, 4 , 6, 8, 9, 10}, c; //c is out
  for(auto it1 = begin(a), it2 = begin(b); it1 != end(a) || it2 != end(b);)
   if(it1 != end(a) && (it2 == end(b) || *it1 < *it2)) {
      c.push_back(*it1);
      ++it1;
    }
   else {
     c.push_back(*it2);
     ++it2;
    }
  for(auto x : c)
   cout<<x<<' ';
  cout<<'\n';
 }

Result:

1 2 3 4 5 6 7 8 9 9 10

Attention! You must compile with the flag -std=c++11 (or other to c++11). For example:

g++ -std=c++11 -Wall -pedantic -Wextra -O2 d.cpp -o program.out

The complexity: Θ(n)

Memory: Θ(n)

It's not hard to see, that each element is evaluated exactly once in O(1), we have n elements, so it's Θ(n).

Memory complexity is obvious. It is worth mentioning that if the two lists are no longer needed, it can be done without additional allocations (const memory).

The algorithm itself has been described so many times that it is not point to write once more.

In main problem we have lots of sequences, but the idea is the same. Here you have enriched example:

int main(void)
 {
  vector<vector<int> > in{{ 1, 3, 5, 7, 9}, { 2, 4 , 6, 8, 9, 10}, {2,5,7,12,10,11,18}};
  vector<int> out;
  typedef tuple<int, vector<int>::iterator, vector<int>::iterator> element;
  priority_queue<element, vector<element>, greater<element> >  least;
  for(auto& x : in) //Adding iterators to the beginning of (no empty) lists
   if(!x.empty())   //and other parts of the element (value and end of vector)
    least.emplace(x.front(),begin(x),end(x));

  while(!least.empty()) {            //Solving
    auto temp = least.top(); least.pop();
    out.push_back(get<0>(temp));     //Add the smallest at the end of out
    ++get<1>(temp);
    if(get<1>(temp) != get<2>(temp)){//If this is not the end
      get<0>(temp) = *get<1>(temp);
      least.push(temp);              //Update queue
     }
   }

  for(const auto& x : out) //Print solution
   cout<<x<<' ';
  cout<<'\n';
 }

The complexity: Θ(n log k)

Memory: Θ(n)

Pop and insert operations are in O(log k), we perform them n times, so it's O(n log k).

Memory is still obvious, we have always k elements in priority_queue, and O(n) in out sequence.

回答4:

The code for this could be similar to a pointer and count based merge sort, starting by creating a "source" array of pointers and counts for each sequence, and allocating a second "destination" array to merge the "source" array of pointers and counts into. Each pass of this algorithm merges pairs of pointers and counts based on the sequences from the "source" array into the "destination" array, reducing the number of entries in the array by about 1/2. Then pointers to the "source" and "destination" arrays are swapped, and the merge process repeated until an array of pointers and counts only has a single entry.

回答5:

The C++ standard library contains std::merge

std::vector<int> v1 { 1,2,5,7 }, 
                 v2 { 3,6,9 }, 
                 out;

std::merge(v1.begin(), v1.end(), 
           v2.begin(), v2.end(), 
           std::back_inserter(out));

http://en.cppreference.com/w/cpp/algorithm/merge