Problem statement:
There are 3 arrays A,B,C all filled with positive integers, and all the three arrays are of the same size.
Find min(|a-b|+|b-c|+|c-a|) where a is in A, b is in B, c is in C.
I worked on the problem the whole weekend. A friend told me that it can be done in linear time. I don't see how that could be possible.
How would you do it ?
Well, I think I can do it in O(n log n). I can only do O(n) if the arrays are initially sorted.
First, observe that you can permute
a
,b
,c
however you like without changing the value of the expression. So letx
be the smallest ofa
,b
,c
; lety
be the middle of the three; and letz
be the maximum. Then note that the expression just equals2*(z-x)
. (Edit: This is easy to see... Once you have the three numbers in order,x < y < z
, the sum is just(y-x) + (z-y) + (z-x)
which equals2*(z-x)
)Thus, all we are really trying to do is find three numbers such that the outer two are as close together as possible, with the other number "sandwiched" between them.
So start by sorting all three arrays in O(n log n). Maintain an index into each array; call these
i
,j
, andk
. Initialize all three to zero. Whichever index points to the smallest value, increment that index. That is, ifA[i]
is smaller thanB[j]
andC[k]
, incrementi
; ifB[j]
is smallest, incrementj
; ifC[k]
is smallest, increment k. Repeat, keeping track of|A[i]-B[j]| + |B[j]-C[k]| + |C[k]-A[i]|
the whole time. The smallest value you observe during this march is your answer. (When the smallest of the three is at the end of its array, stop because you are done.)At each step, you add one to exactly one index; but you can only do this
n
times for each array before hitting the end. So this is at most3*n
steps, which is O(n), which is less than O(n log n), meaning the total time is O(n log n). (Or just O(n) if you can assume the arrays are sorted.)Sketch of a proof that this works: Suppose
A[I]
,B[J]
,C[K]
are thea
,b
,c
that form the actual answer; i.e., they have the minimum|a-b|+|b-c|+|c-a|
. Suppose further thata
>b
>c
; the proof for the other cases is symmetric.Lemma: During our march, we do not increment
j
pastJ
until after we incrementk
pastK
. Proof: We always increment the index of the smallest element, and whenk <= K
,B[J] > C[k]
. So whenj=J
andk <= K
,B[j]
is not the smallest element, so we do not incrementj
.Now suppose we increment
k
pastK
beforei
reachesI
. What do things look like just before we perform that increment? Well,C[k]
is the smallest of the three at that moment, because we are about to incrementk
.A[i]
is less than or equal toA[I]
, becausei < I
andA
is sorted. Finally,j <= J
becausek <= K
(by our Lemma), soB[j]
is also less thanA[I]
. Taken together, this means our sum-of-abs-diff at this moment is less than2*(c-a)
, which is a contradiction.Thus, we do not increment
k
pastK
untili
reachesI
. Therefore, at some point during our marchi=I
andk=K
. By our Lemma, at this pointj
is less than or equal toJ
. So at this point, eitherB[j]
is less than the other two andj
will get incremented; orB[j]
is between the other two and our sum is just2*(A[i]-C[k])
, which is the right answer.This proof is sloppy; in particular, it fails to explicitly account for the case where one or more of
a
,b
,c
are equal. But I think that detail can be worked out pretty easily.I would write a really simple program like this:
And test it over and over until I saw some pattern emerge. The pattern I found here is what would be expected: the numbers that are closest together in each set, regardless of whether the numbers are "high" or "low", are those that produce the smallest minimum sum. So it becomes a nearest-number problem. For whatever that's worth, probably not much.