I have a sorted std::vector<int>
and I would like to find the longest 'streak of consecutive numbers' in this vector and then return both the length of it and the smallest number in the streak.
To visualize it for you :
suppose we have :
1 3 4 5 6 8 9
I would like it to return: maxStreakLength = 4
and streakBase = 3
There might be occasion where there will be 2 streaks and we have to choose which one is longer.
What is the best (fastest) way to do this ? I have tried to implement this but I have problems with coping with more than one streak in the vector. Should I use temporary vectors and then compare their lengths?
No you can do this in one pass through the vector and only storing the longest start point and length found so far. You also need much fewer than 'N' comparisons. *
hint: If you already have say a 4 long match ending at the 5th position (=6) and which position do you have to check next?
[*] left as exercise to the reader to work out what's the likely O( ) complexity ;-)
It would be interesting to see if the fact that the array is sorted can be exploited somehow to improve the algorithm. The first thing that comes to mind is this: if you know that all numbers in the input array are unique, then for a range of elements [i, j]
in the array, you can immediately tell whether elements in that range are consecutive or not, without actually looking through the range. If this relation holds
array[j] - array[i] == j - i
then you can immediately say that elements in that range are consecutive. This criterion, obviously, uses the fact that the array is sorted and that the numbers don't repeat.
Now, we just need to develop an algorithm which will take advantage of that criterion. Here's one possible recursive approach:
- Input of recursive step is the range of elements
[i, j]
. Initially it is [0, n-1]
- the whole array.
- Apply the above criterion to range
[i, j]
. If the range turns out to be consecutive, there's no need to subdivide it further. Send the range to output (see below for further details).
- Otherwise (if the range is not consecutive), divide it into two equal parts
[i, m]
and [m+1, j]
.
- Recursively invoke the algorithm on the lower part (
[i, m]
) and then on the upper part ([m+1, j]
).
The above algorithm will perform binary partition of the array and recursive descent of the partition tree using the left-first approach. This means that this algorithm will find adjacent subranges with consecutive elements in left-to-right order. All you need to do is to join the adjacent subranges together. When you receive a subrange [i, j]
that was "sent to output" at step 2, you have to concatenate it with previously received subranges, if they are indeed consecutive. Or you have to start a new range, if they are not consecutive. All the while you have keep track of the "longest consecutive range" found so far.
That's it.
The benefit of this algorithm is that it detects subranges of consecutive elements "early", without looking inside these subranges. Obviously, it's worst case performance (if ther are no consecutive subranges at all) is still O(n)
. In the best case, when the entire input array is consecutive, this algorithm will detect it instantly. (I'm still working on a meaningful O estimation for this algorithm.)
The usability of this algorithm is, again, undermined by the uniqueness requirement. I don't know whether it is something that is "given" in your case.
Anyway, here's a possible C++ implementation
typedef std::vector<int> vint;
typedef std::pair<vint::size_type, vint::size_type> range;
class longest_sequence
{
public:
const range& operator ()(const vint &v)
{
current = max = range(0, 0);
process_subrange(v, 0, v.size() - 1);
check_record();
return max;
}
private:
range current, max;
void process_subrange(const vint &v, vint::size_type i, vint::size_type j);
void check_record();
};
void longest_sequence::process_subrange(const vint &v,
vint::size_type i, vint::size_type j)
{
assert(i <= j && v[i] <= v[j]);
assert(i == 0 || i == current.second + 1);
if (v[j] - v[i] == j - i)
{ // Consecutive subrange found
assert(v[current.second] <= v[i]);
if (i == 0 || v[i] == v[current.second] + 1)
// Append to the current range
current.second = j;
else
{ // Range finished
// Check against the record
check_record();
// Start a new range
current = range(i, j);
}
}
else
{ // Subdivision and recursive calls
assert(i < j);
vint::size_type m = (i + j) / 2;
process_subrange(v, i, m);
process_subrange(v, m + 1, j);
}
}
void longest_sequence::check_record()
{
assert(current.second >= current.first);
if (current.second - current.first > max.second - max.first)
// We have a new record
max = current;
}
int main()
{
int a[] = { 1, 3, 4, 5, 6, 8, 9 };
std::vector<int> v(a, a + sizeof a / sizeof *a);
range r = longest_sequence()(v);
return 0;
}
I believe that this should do it?
size_t beginStreak = 0;
size_t streakLen = 1;
size_t longest = 0;
size_t longestStart = 0;
for (size_t i=1; i < len.size(); i++) {
if (vec[i] == vec[i-1] + 1) {
streakLen++;
}
else {
if (streakLen > longest) {
longest = streakLen;
longestStart = beginStreak;
}
beginStreak = i;
streakLen = 1;
}
}
if (streakLen > longest) {
longest = streakLen;
longestStart = beginStreak;
}
You can't solve this problem in less than O(N)
time. Imagine your list is the first N-1
even numbers, plus a single odd number (chosen from among the first N-1
odd numbers). Then there is a single streak of length 3 somewhere in the list, but worst case you need to scan the entire list to find it. Even on average you'll need to examine at least half of the list to find it.
Similar to Rodrigo's solutions but solving your example as well:
#include <vector>
#include <cstdio>
#define len(x) sizeof(x) / sizeof(x[0])
using namespace std;
int nums[] = {1,3,4,5,6,8,9};
int streakBase = nums[0];
int maxStreakLength = 1;
void updateStreak(int currentStreakLength, int currentStreakBase) {
if (currentStreakLength > maxStreakLength) {
maxStreakLength = currentStreakLength;
streakBase = currentStreakBase;
}
}
int main(void) {
vector<int> v;
for(size_t i=0; i < len(nums); ++i)
v.push_back(nums[i]);
int lastBase = v[0], currentStreakBase = v[0], currentStreakLength = 1;
for(size_t i=1; i < v.size(); ++i) {
if (v[i] == lastBase + 1) {
currentStreakLength++;
lastBase = v[i];
} else {
updateStreak(currentStreakLength, currentStreakBase);
currentStreakBase = v[i];
lastBase = v[i];
currentStreakLength = 1;
}
}
updateStreak(currentStreakLength, currentStreakBase);
printf("maxStreakLength = %d and streakBase = %d\n", maxStreakLength, streakBase);
return 0;
}