Given an array like [15, 14, 12, 3, 10, 4, 2, 1]
. How can I determine which elements are out of order and remove them (the number 3 in this case). I don't want to sort the list, but detect outliers and remove them.
Another example:
[13, 12, 4, 9, 8, 6, 7, 3, 2]
I want to be able to remove #4 and #7 so that I end up with:
[13, 12, 9, 8, 6, 3, 2]
There's also a problem that arises when you have this scenario:
[15, 13, 12, 7, 10, 5, 4, 3]
You could either remove 7 or 10 to make this array sorted.
In general, the problem I'm trying to solve, is that given a list of numerical readings (some could be off by quite a bit). I want the array to only include values that follow the general trendline and remove any outliers. I'm just wondering if there is a simple way to do this.
I would reduce your problem to the longest increasing (decreasing) subsequence problem.
https://en.wikipedia.org/wiki/Longest_increasing_subsequence
Since your sequence is nearly sorted, you are guaranteed to receive a satisfactory result (i.e. neatly following the trendline).
There exists a number of solutions to it; one of them is portrayed in the free book "Fundamentals of Computer Programming with C#" by Svetlin Nakov and Veselin Kolev; the problem is presented on page 257, exercise 6; solution is on page 260.
Taken from the book:
A simple algorithm which has been described by higuaro can help you to generate a correct sequence:
For each element at index
i
, ifa[i] < a[i + 1]
, we can simply remove that elementa[i]
.However, this approach cannot guarantee that the number of removed element is minimum. For example, for this sequence [10, 9, 8, 100, 1, 0], remove 100 will be optimal, instead of remove 8, then 9 then 10.
To find the minimum number of element to be removed, we notice that we need to find the longest decreasing sub sequence, which is similar to the classic longest increasing sub sequence whose solution has been described here