I'm practicing algorithms and one of my tasks is to count the number of all longest increasing sub-sequences for given 0 < n <= 10^6 numbers. Solution O(n^2) is not an option.
I have already implemented finding a LIS and its length (LIS Algorithm), but this algorithm switches numbers to the lowest possible. Therefore, it's impossible to determine if sub-sequences with a previous number (the bigger one) would be able to achieve the longest length, otherwise I could just count those switches, I guess.
Any ideas how to get this in about O(nlogn)? I know that it should be solved using dynamic-programming.
I implemented one solution and it works well, but it requires two nested loops (i in 1..n) x (j in 1..i-1).
So it's O(n^2) I think, nevertheless it's too slow.
I tried even to move those numbers from array to a binary tree (because in each i iteration I look for all smaller numbers then number[i] - going through elements i-1..1), but it was even slower.
Example tests:
1 3 2 2 4
result: 3 (1,3,4 | 1,2,4 | 1,2,4)
3 2 1
result: 3 (1 | 2 | 3)
16 5 8 6 1 10 5 2 15 3 2 4 1
result: 3 (5,8,10,15 | 5,6,10,15 | 1,2,3,4)
Sasha Salauyou's answer is great but I am not clear why
here is my code based on the same idea
Patience sorting is also O(N*logN), but way shorter and simpler than the methods based on binary search:
Finding the number of all longest increasing subsequences
Full Java code of improved LIS algorithm, which discovers not only the length of longest increasing subsequence, but number of subsequences of such length, is below. I prefer to use generics to allow not only integers, but any comparable types.
Explanation
As my explanation seems to be long, I will call initial sequence "seq" and any its subsequence "sub". So the task is to calculate count of longest increasing subs that can be obtained from the seq.
As I mentioned before, idea is to keep counts of all possible longest subs obtained on previous steps. So let's create a numbered list of rows, where number of each line equals the length of subs stored in this row. And let's store subs as pairs of numbers (v, c), where "v" is "value" of ending element, "c" is "count" of subs of given length that end by "v". For example:
We will build such list step by step, taking elements from initial sequence by their order. On every step we will try to add this element to the longest sub that it can be added to and record changes.
Building a list
Let's build the list using sequence from your example, since it has all possible options:
First, take element 16. Our list is empty so far, so we just put one pair in it:
Next is 5. It cannot be added to a sub that ends by 16, so it will create new sub with length of 1. We create a pair (5, 1) and put it into line 1:
Element 8 is coming next. It cannot create the sub [16, 8] of length 2, but can create the sub [5, 8]. So, this is where algorithm is coming. First, we iterate the list rows upside down, looking at the "values" of last pair. If our element is greater than values of all last elements in all rows, then we can add it to existing sub(s), increasing its length by one. So value 8 will create new row of the list, because it is greater than values all last elements existing in the list so far (i. e. > 5):
Element 8 can continue 5, but cannot continue 16. So we need to search through previous row, starting from its end, calculating the sum of "counts" in pairs which "value" is less than 8:
Why don't we store value 8 into subs of length 1 (first line)? Because we need subs of maximum possible length, and 8 can continue some previous subs. So every next number greater than 8 will also continue such sub and there is no need to keep 8 as sub of length less that it can be.
Next. 6. Searching upside down by last "values" in rows:
Found the room for 6, need to calculate a count:
After processing 1:
After processing 10:
After processing 5:
After processing 2:
After processing 15:
After processing 3:
After processing 2:
If when searching rows by last element we find equal element, we calculate its "count" again based on previous row, and add to existing "count".
After processing 4:
After processing 1:
So what do we have after processing all initial sequence? Looking at the last row, we see that we have 3 longest subs, each consist of 4 elements: 2 end by 15 and 1 ends by 4.
What about complexity?
On every iteration, when taking next element from initial sequence, we make 2 loops: first when iterating rows to find room for next element, and second when summarizing counts in previous row. So for every element we make maximum to n iterations (worst cases: if initial seq consists of elements in increasing order, we will get a list of n rows with 1 pair in every row; if seq is sorted in descending order, we will obtain list of 1 row with n elements). By the way, O(n2) complexity is not what we want.
First, this is obvious, that in every intermediate state rows are sorted by increasing order of their last "value". So instead of brute loop, binary searching can be performed, which complexity is O(log n).
Second, we don't need to summarize "counts" of subs by looping through row elements every time. We can summarize them in process, when new pair is added to the row, like:
So second number will show not count of longest subs that can be obtained with given value at the end, but summary count of all longest subs that end by any element that is greater or equal to "value" from the pair.
Thus, "counts" will be replaced by "sums". And instead of iterating elements in previous row, we just perform binary search (it is possible because pairs in any row are always ordered by their "values") and take "sum" for new pair as "sum" of last element in previous row minus "sum" from element left to found position in previous row plus "sum" of previous element in the current row.
So when processing 4:
4 will be paired with (3-2+2): ("sum" from the last pair of previous row) - ("sum" from pair left to found position in previous row) + ("sum" from previous pair in current row):
In this case, final count of all longest subs is "sum" from the last pair of the last row of the list, i. e. 3, not 3 + 2.
So, performing binary search to both row search and sum search, we will come with O(n*log n) complexity.
What about memory consumed, after processing all array we obtain maximum n pairs, so memory consumption in case of dynamic arrays will be O(n). Besides, when using dynamic arrays or collections, some additional time is needed to allocate and resize them, but most operations are made in O(1) time because we don't make any kind of sorting and rearrangement during process. So complexity estimation seems to be final.
Cpp implementation of above logic:
Although I completely agree with Alex this can be done very easily using Segment tree. Here is the logic to find the length of LIS using segment tree in NlogN. https://www.quora.com/What-is-the-approach-to-find-the-length-of-the-strictly-increasing-longest-subsequence Here is an approach that finds no of LIS but takes N^2 complexity. https://codeforces.com/blog/entry/48677
We use segment tree(as used here) to optimize approach given in this. Here is the logic:
first sort the array in ascending order(also keep the original order), initialise segment tree with zeroes, segment tree should query two things(use pair for this) for a given range: a. max of first. b. sum of second corresponding to max-first. iterate through sorted array. let j be the original index of current element, then we query (0 - j-1) and update the j-th element(if result of query is 0,0 then we update it with (1,1)).
Here is my code in c++: