I have read this problem Find the most common entry in an array
and the answer from jon skeet is just mind blowing .. :)
Now I am trying to solve this problem find an element which occurs more than n/3 times in an array ..
I am pretty sure that we cannot apply the same method because there can be 2 such elements which will occur more than n/3 times and that gives false alarm of the count ..so is there any way we can tweak around jon skeet's answer to work for this ..?
Or is there any solution that will run in linear time ?
Jan Dvorak's answer is probably best:
At the end, make a second pass over the array to check whether the candidates really do have the required count. This isn't allowed by the question you link to but I don't see how to avoid it for this modified version. If there is a value that occurs more than n/3 times then it will be in a slot, but you don't know which one it is.
If this modified version of the question guaranteed that there were two values with more than
n/3
elements (in general,k-1
values with more thann/k
) then we wouldn't need the second pass. But when the original question hask=2
and 1 guaranteed majority there's no way to know whether we "should" generalize it as guaranteeing 1 such element or guaranteeingk-1
. The stronger the guarantee, the easier the problem.I use the following Python solution to discuss the correctness of the algorithm:
First, we need to prove claim A:
Claim A: Consider a list
C
which contains a majority numberm
which occurs morefloor(n/3)
times. After 3 different numbers are removed fromC
, we haveC'
.m
is the majority number ofC'
.Proof: Use
R
to denotem
's occurrence count inC
. We haveR > floor(n/3)
.R > floor(n/3)
=>R - 1 > floor(n/3) - 1
=>R - 1 > floor((n-3)/3)
. UseR'
to denotem
's occurrence count inC'
. And usen'
to denote the length ofC'
. Since 3 different numbers are removed, we haveR' >= R - 1
. Andn'=n-3
is obvious. We can haveR' > floor(n'/3)
fromR - 1 > floor((n-3)/3)
. Som
is the majority number ofC'
.Now let's prove the correctness of the
loop 1
. DefineL
ascount1 * [num1] + count2 * [num2] + nums[i:]
. Usem
to denote the majority number.Invariant
The majority number
m
is inL
.Initialization
At the start of the first itearation,
L
isnums[0:]
. So the invariant is trivially true.Maintenance
if count1 == 0
branch: Before the iteration,L
iscount2 * [num2] + nums[i:]
. After the iteration,L
is1 * [nums[i]] + count2 * [num2] + nums[i+1:]
. In other words,L
is not changed. So the invariant is maintained.if val == num1
branch: Before the iteration,L
iscount1 * [nums[i]] + count2 * [num2] + nums[i:]
. After the iteration,L
is(count1+1) * [num[i]] + count2 * [num2] + nums[i+1:]
. In other words,L
is not changed. So the invariant is maintained.f count2 == 0
branch: Similar to condition 1.elif val == num2
branch: Similar to condition 2.else
branch:nums[i]
,num1
andnum2
are different to each other in this case. After the iteration,L
is(count1-1) * [num1] + (count2-1) * [num2] + nums[i+1:]
. In other words, three different numbers are moved fromcount1 * [num1] + count2 * [num2] + nums[i:]
. From claim A, we knowm
is the majority number ofL
.So the invariant is maintained.Termination
When the loop terminates,
nums[n:]
is empty.L
iscount1 * [num1] + count2 * [num2]
.So when the loop terminates, the majority number is either
num1
ornum2
.At line number five, the
if
statement should have one more check:You can use Selection algorithm to find the number in the n/3 place and 2n/3.
If there are n elements in the array , and suppose in the worst case only 1 element is repeated n/3 times , then the probability of choosing one number that is not the one which is repeated n/3 times will be (2n/3)/n that is 1/3 , so if we randomly choose N elements from the array of size ‘n’, then the probability that we end up choosing the n/3 times repeated number will be atleast 1-(2/3)^N . If we eqaute this to say 99.99 percent probability of getting success, we will get N=23 for any value of “n”.
Therefore just choose 23 numbers randomly from the list and count their occurrences , if we get count greater than n/3 , we will return that number and if we didn’t get any solution after checking for 23 numbers randomly , return -1;
The algorithm is essentially O(n) as the value 23 doesn’t depend on n(size of list) , so we have to just traverse array 23 times at worst case of algo.
Accepted Code on interviewbit(C++):
Using Boyer-Moore Majority Vote Algorithm, we get: