So for the following array, where L = 3
-5 -1 2 -3 0 -3 3
The best possible sum of at least length 3 would be 0, where the subsequence is the last three elements (0, -3, 3)
How can you calculate this sum for any array in faster than O(NL) (effectively O(N^2) if L==0) time?
I believe that you can do this in O(n) time regardless of the choice of by using a modified version of Kadane's algorithm.
To see how this works, let's consider the case where L = 0. In that case, we want to find the maximum-sum subarray of the original sequence. This can be solved by Kadane's algorithm, a clever dynamic programming solution that works as follows. The idea is to keep track of the weight of the maximum-weight subarray ending just before and just after each position in the array. Whichever of these arrays has the largest sum is the subarray with the maximum total sum. Let the original array be A and let the array of maximum sums ending at position k be array M. Then Kadane's algorithm works like this:
- Set M(0) = 0. Any subarray ending just before the first array entry can't have anything in it, so it has sum zero.
- For each array index k, in order, set M(k + 1) = max(0, M(k) + A(k)). The idea here is that the best subarray ending just before this position is either formed by extending the best array from the previous position by a single element, or by discarding that array entirely and just picking the empty subarray before this position.
Once you've filled in this table M, you can just scan it to find the maximum value overall, which gives you the weight of the maximum-weight subarray.
But how do we adapt this to the case where L ≠ 0? Fortunately, this isn't too bad. Look at the recurrence for Kadane's algorithm. The idea is that at each point we can either extend the array by one step, or we can reset back to the empty array. But if we have a lower bound on the size of our subarray, we can think of this differently: the maximum-weight subarray of length at least L ending just before position k + 1 is formed either by extending the best array of length at least L that ends just before position k by one element, or by discarding that array and taking the L element subarray that ends right before position k. This gives us a new version of Kadane's algorithm that looks like this:
- Set M(L) equal to the sum of the first L elements of the array.
- For each array index k ≥ L, in order, set M(k + 1) to the maximum of M(k) + A(k) (the value we get by extending the array) and the sum of the L elements just before position k + 1 (the value we get by just taking the last k elements).
If we run this, we will fill in table M values from L to the length of the array. The maximum value in that range is then the maximum sum subarray value for subarrays of length at least L.
But this doesn't run in linear time! In particular, it runs in O(nL), since each iteration of the computation has to look at the previous L elements of the array. However, by doing some extra pre computation, we can drop this to O(n). The idea is that we can build up a table containing the sums of the L element before each array index in O(n) time as follows. First, sum up the first L elements of the array and store that as S(L). This is the sum of the L elements just before position L. Now, if we want to get the sum of the L elements just before index L + 1, wr can do s by summing up the first L elements of the array, adding in the next array element, then subtracting out the very first array element. This can be done in O(1) time by computing S(L + 1) = S(L) + A(L) - A(0). We can then use a similar trick to compute S(L + 2) = S(L + 1) + A(L + 1) - A(1). More generally, we can fill in this table of partial sums in O(n) time using the recurrence
- S(L) = A(0) + A(1) + ... + A(L - 1).
- S(L + k + 1) = S(L + k) + A(L + k) - A(k).
This runs in O(n) time. If we have this table precomputed, we can then find the maximum-weight subarray of length at least L by using this recurrence from above:
- M(L) = S(L)
- M(L + k + 1) = max(M(L + k) + A(L + k), S(L + k))
We can then just scan across the M array to find the maximum value. This whole process runs in O(n) time: we need O(n) time to compute the S array, O(n) time to compute M array, and O(L) = O(n) time to find the maximum value. It also takes O(L) space, since we need to store the M and S arrays.
But we can do better than this by reducing the memory usage to O(1)! The trick is to notice that at each point we don't need the entire M and S arrays; just the last term. We can therefore just store the last value of M and S, which takes only O(1) memory. At each point, we will also keep track of the maximum value we've seen in the M array, so we don't need to hold the M array after we've filled it in. This then gives the following O(n)-time, O(1)-space algorithm for solving the problem:
- Set S to the sum of the first L array elements.
- Set M = S.
- Set Best = M
- For k = L + 1 up to n, the length of the array:
- Set S = S + A(k) - A(k - L)
- Set M = max(M + A(k), S)
- Set Best = max(Best, M)
- Output Best
As an example, here's a trace through the algorithm on your original array with L = 3:
-5 -1 2 -3 0 -3 3
S -4 -2 -1 -6 0
M -4 -2 -1 -4 0
Best -4 -2 -1 -1 0
So the output is 0.
Or, on a different array with L = 2:
0 5 -3 -1 2 -4 -1 7 8
S 5 2 -4 1 -2 -5 6 15
M 5 2 1 3 -1 -2 6 15
Best 5 5 5 5 5 5 6 15
So the output is 15.
Hope this helps! This is a really cool problem!
EDIT: I have a C++ implementation of this algorithm available if you're interested in looking at some actual code for the solution.
This is possible to do using dynamic programming in O(n).
1.) Store the partial sums up to i for each index i in the array
2.) Store the index of the minimum sum up to i
3.) Store the maximum up to i for each index i in the array, which is the partial sum up to i minus the partial sum with the index determined in step 2, which is Min(Sum(k)) k <=i keeping in mind the constraint that the sub sequence must be at least of length L.
All of this can be done in O(n) in one loop.
Now that you have the maximum sums up to i for each index i in the array you can determine the maximum sum of the contiguous sub sequence and the end index of that sub sequence. When you have the end index, you can just walk backwards until you have reached that maximum sum. Both of these operations are also O(n).
Sample implementation in C#:
int [] values = {-5, -1, 2, -3, 0, -3, 3};
int L = 3;
int[] sumUpTo = new int [values.Length];
int[] minUpTo = new int[values.Length];
int[] maxUpTo = new int[values.Length];
for (int i = 0; i < values.Length; i++)
{
sumUpTo[i] = values[i];
minUpTo[i] = i;
if (i > 0)
{
sumUpTo[i] += sumUpTo[i - 1];
minUpTo[i] = sumUpTo[i] < sumUpTo[i - 1] ? i : minUpTo[i - 1];
}
maxUpTo[i] = sumUpTo[i] - ((i >= L && sumUpTo[minUpTo[i - L]] < 0) ? sumUpTo[minUpTo[i - L]] : 0);
}
int maxSum = int.MinValue;
int endIndex = -1;
for (int i = L-1 ; i < values.Length; i++)
if(maxUpTo[i] > maxSum)
{
endIndex = i;
maxSum = maxUpTo[i];
}
//Now walk backwards from maxIndex until we have reached maxSum
int startIndex = endIndex;
int currentSum = values[startIndex];
while (currentSum != maxSum || (endIndex - startIndex < L-1))
{
startIndex--;
currentSum += values[startIndex];
}
Console.WriteLine("value of maximum sub sequence = {0}, element indexes {1} to {2}", maxSum, startIndex, endIndex);
Here is the JAVA version :
Note : Credit goes to @templatetypedef. That explanation is awesome.
public static int max_sum_in_subarray_of_minimum_length(int [] array, int min_length){
int running_sum=0, max_sum_up_to_here=0, max_sum=0;
int begin=0, end=0, max_start=0;
/* max_sum_up_here = sum of all elements in array up to length L */
for(int i=0;i<min_length;i++){
max_sum_up_to_here+=array[i];
}
/* running sum and max sum = max_sum_up_here */
running_sum = max_sum_up_to_here;
max_sum= running_sum;
/* Iterate through all elements starting from L i.e minimum length */
for(int i=min_length;i<array.length;i++){
/* min_sum_up_to_here = min_sum_up_to_here +
next element in array - (i-L)th element in array */
max_sum_up_to_here+=array[i]-array[i-min_length];
/* if running_sum + next element in array > max_sum_up_to here then
running_sum = running_sum + next element in array
else running_sum = max_sum_up_to_here */
if( (running_sum+array[i]) > max_sum_up_to_here ){
running_sum = running_sum+array[i];
max_start = i-min_length+1;
}else{
running_sum= max_sum_up_to_here;
}
/* if running sum > max_sum then max_sum = running sum */
if( max_sum < running_sum ){
max_sum = running_sum;
begin =max_start;
end=i;
}
}
/* max_sum gives sum of contiguous sub array of length L and begin and end gives indexes of the sub array*/
return max_sum;
}
Useless cases and definitions, etc. My solution is the natural one.
First of all, keep this in mind, we are looking for the maximum sum of a contiguous fragment of an array of integers, that fragment has more than or exactly L elements. Let's name A the initial array. For the same reasons like in Kadane's algorithm, we consider an auxiliary array, REZ, having N elements, like A, REZ[i] means the maximum sum of a contiguous fragment of A, containing at least L elements and ending exactly at the i-th position. Of course, REZ[1], RZ[2], REZ[L-1] are all equal with a ZERO or -INFINITY value. REZ[L]=A[1]+A[2]+...+A[L].
For the rest of the values in REZ, from i growing from L+1 to N, to calculate REZ[i] we have to choose the maximum between two cases:
- a fragment of exactly L values and containing A[i]
- a fragment having more than L values and containing A[i]
The result for the first case can be calculated instantly with the partial sum array (S[i]=A[1]+A[2]+...+A[i]), S[i]-S[i-L]. The result for the second case is REZ[i-1]+A[i].
So,
- REZ[i]=-INFINITY, if 1<=i<=L-1
- REZ[i]=S[i], if i=L
- REZ[i]=max(S[i]-S[i-L], REZ[i-1]+A[i]), if i>L.
After REZ was built we have to calculate its maximum value.
Let's consider the following example:
N=7
A -5 -1 2 -3 0 -3 3
L=3
S -5 -6 -4 -7 -7 -10 -7
REZ: -INF -INF -4
REZ[4]=max(S[4]-S[4-3],REZ[3]+A[4])=max(-2, -7)=-2
REZ: -INF -INF -4 -2
REZ[5]=max(S[5]-S[5-3],REZ[4]+A[5])=max(-1,-2)=-1
REZ: -INF -INF -4 -2 -1
REZ[6]=max(S[6]-S[6-3], REZ[5]+A[6])=max(-6,-4)=-4
REZ: -INF -INF -4 -2 -1 -4
REZ[7]=max(S[7]-S[7-3],REZ[6]+A[7])=max(0,-1)= 0
REZ: -INF -INF -4 -2 -1 -4 0
The maximum value in REZ is 0 and this is the answer for the whole problem.
I hope my English is good enough. I am searching for a solution for a similar problem, when the result must have at most L consecutive elements. When I realised that the methods described above were actually for solutions having at least L elements, I was quite disappointed.
Below is my Java implementation.
public static int maxSumSubsequenceGivenLength(int[] array, int l) {
if (null == array || array.length < l) {
return -1;
}
int previousSequenceSum = 0;
for (int i = 0; i < l; i++) {
previousSequenceSum += array[i];
}
int maxSum = previousSequenceSum;
int currentSum = 0;
int startIndexFinal = 0;
int endIndexFinal = l - 1;
for (int i = l; i < array.length; i++) {
currentSum = previousSequenceSum + array[i] - array[i - l];
if (currentSum > maxSum) {
maxSum = currentSum;
endIndexFinal = i;
startIndexFinal = i - l + 1;
}
previousSequenceSum = currentSum;
}
System.out.println("start index:" + startIndexFinal + " end index: " + endIndexFinal);
return maxSum;
}