I am looking for an algorithm to merge multiple sorted sequences, lets say X sorted sequences with n elements, into one sorted sequence in c++ , can you provide some examples?
note: I do not want to use any library
I am looking for an algorithm to merge multiple sorted sequences, lets say X sorted sequences with n elements, into one sorted sequence in c++ , can you provide some examples?
note: I do not want to use any library
Assumptions
The following method works with any container like array, vector, list etc. I'm assuming that we are working with lists.
Let's assume that we have
m
sorted lists which we want to merge.Let
n
denotes the total number of elements in all lists.Idea
The first element in the resulting list has to be the smallest element in the set of all heads of the lists.
The idea is quite simple. Just select the smallest head and move it from the original list to the result. You want to repeat that routine while there is at least one non empty list. The crucial thing here is to select the smallest head fast.
If m is small
A linear scan through the heads is
O(m)
resulting inO(m * n)
total time which is fine ifm
is a small constant.If m is not so small
Then we can do better by using a priority queue, for example a heap. The invariant here is that the smallest element in the heap is always the smallest element from current heads.
Finding the minimum element is a heap is
O(1)
, deleting the minimum isO(log m)
if there arem
elements in the heap, and inserting an element into the heap is alsoO(log m)
.In summary, for each of
n
elements, we insert it into the heap once and delete it from there also once. The total complexity with a heap isO(n log m)
which is significantly faster thatO(n * m)
ifm
is not a small constant.Summary
Which method is faster depends on how many lists we want to merge. If
m
is small pick the linear scan, in the other case implement it with a priority queue. Sometimes it's hard to judge if them
is small or not and in that case some experiments will be helpful.I assume that without libraries to the merger. Otherwise, you have to write an own linked list (this may be forward, or normal list). Rest the same. Easy example (for two lists):
Result:
Attention! You must compile with the flag -std=c++11 (or other to c++11). For example:
The complexity: Θ(n)
Memory: Θ(n)
It's not hard to see, that each element is evaluated exactly once in O(1), we have n elements, so it's Θ(n).
Memory complexity is obvious. It is worth mentioning that if the two lists are no longer needed, it can be done without additional allocations (const memory).
The algorithm itself has been described so many times that it is not point to write once more.
In main problem we have lots of sequences, but the idea is the same. Here you have enriched example:
The complexity: Θ(n log k)
Memory: Θ(n)
Pop and insert operations are in O(log k), we perform them n times, so it's O(n log k).
Memory is still obvious, we have always k elements in priority_queue, and O(n) in out sequence.
The C++ standard library contains
std::merge
http://en.cppreference.com/w/cpp/algorithm/merge
There are three methods that do the merging :-
Suppose you are merging
m lists
withn elements each
Algorithm 1 :-
Merge lists two at a time. Use merge sort like merge routine to merge as the lists are sorted. This is very simple to implement without any libraries. But takes time
O(m^2*n)
which is small enough if m is not large.Algorithm 2:-
This is an improvement over 1. where we always merge list which are the smallest two in the remaining list. Use a
priority queue
to do that and select smallest two list and merge them and add new list to queue. Do this till only 1 list is left which would be your answer. This technique is used inhuffman coding
and producesoptimal merge pattern
. This takesO(m*n*logm)
. Moreover for similar sized lists it can be madeparallel
as we can select a pair of list and merge in parallel. Assuming you havem processors
then the algorithm can ideally run inO(n*logm)
instead ofO(m*n*logm)
Algorithm 3:-
This is most efficient algorithm where you maintain a
priority queue
for first elements of all lists and extract min to get new element also maintain index of the list min element belongs to so that you can add the next element from that list. This takeO(s*logm)
where s is total elements in all lists.The code for this could be similar to a pointer and count based merge sort, starting by creating a "source" array of pointers and counts for each sequence, and allocating a second "destination" array to merge the "source" array of pointers and counts into. Each pass of this algorithm merges pairs of pointers and counts based on the sequences from the "source" array into the "destination" array, reducing the number of entries in the array by about 1/2. Then pointers to the "source" and "destination" arrays are swapped, and the merge process repeated until an array of pointers and counts only has a single entry.