In the article http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binarySearch, the author discusses binary search. He makes a distinction between finding the lowest value where something is true, and the highest value where something is false. The array being searched looks something like:
false false false true true
I am curious as to why these two cases are different. Why can't you just find the lowest value which is true, then subtract one to find the highest value which is false?
Edit2: Ok, so I understand lower vs upper bound. Now, I am struggling to understand, when searching for the smallest integer greater than or equal to the query, why we can't just change the if(mid>query)
to if(mid>=query)
and have it do lower instead of upper bound.
Edit: Here is what the article stated:
"Now we finally get to the code which implements binary search as described in this and the previous section:
binary_search(lo, hi, p):
while lo < hi:
mid = lo + (hi-lo)/2
if p(mid) == true:
hi = mid
else:
lo = mid+1
if p(lo) == false:
complain // p(x) is false for all x in S!
return lo // lo is the least x for which p(x) is true
...
If we wanted to find the last x for which p(x) is false, we would devise (using a similar rationale as above) something like:
binary_search(lo, hi, p):
while lo < hi:
mid = lo + (hi-lo+1)/2 // note: division truncates
if p(mid) == true:
hi = mid-1
else:
lo = mid
if p(lo) == true:
complain // p(x) is true for all x in S!
return lo // lo is the greatest x for which p(x) is false
."
The two algorithms obviously differ in the condition of what should happen if there are is no
true
or nofalse
value as is actually quite obvious from the code snippet: if you find the lowest value where the value istrue
and subtract 1 from this position to find the highest value yieldingfalse
an incorrect result is produced as there is no such object. Since the algorithms simply target different elements dealing with locating the appropriate element directly rather than having a special case also avoids having to deal with a special case, reducing the amount of code. Since special case code tends to be executed only once for each algorithm invocation it is likely to perform slightly worse than avoiding the special case. This is something which may be worth measuring.Note that the code example isn't C++ despite the question being tagged C++. As a result it isn't idiomatic C++. The typical approach in C++ to implement something like
lower_bound()
orupper_bound()
is to use appropriate iterators. These algorithms wouldn't "complain" if there is no suitable element as they'd just produce an iterator the appropriate position, i.e., an iterator to the start forstd::lower_bound()
and a past-the-end iterator forstd::upper_bound()
.The lower and upper bound of a binary search are the lowest and highest position where the value could be inserted without breaking the ordering. (In the C++ standard library, these bounds will be represented by iterators referencing the element before which the value could be inserted, but the concept is not essentially changed.)
Take, for example, a sorted range
In a binary search for
3
, we will haveAnd in a binary search for
5
:The lower and upper bound are the same if the element does not exist in the range. In a binary search for
8
:The author of the article to which you refer phrases all this in the equivalent terms of "smaller than" and "greater than" so that in a search of 5,
The C++ iterators will, in all these cases, refer to the element directly behind the bound. That is to say:
3
, the iterator returned bystd::lower_bound
would refer to3
and the one fromstd::upper_bound
would refer to4
5
, the iterator returned bystd::lower_bound
would refer to the first5
and the one fromstd::upper_bound
would refer to6
8
, both would refer to9
This is because the convention in the C++ standard library for insertions is to pass an iterator referring to the element before which the new element should be inserted. For example, after
vec
would contain1, 2, 3, 4, 5, 5, 5, 6, 7, 9
.std::lower_bound
andstd::upper_bound
follow this convention so thatwork as desired and leave
vec
sorted.More generally, this is an expression of the way ranges are specified in the C++ standard library. The beginning iterator of a range refers to the first element of the range (if any), while the ending iterator refers to the element (if any) directly behind the end of the range. Another way to look at it is that the iterators returned by
std::lower_bound
andstd::upper_bound
span the range of elements in the searched range that are equivalent to the searched element.This range is empty if the element is not in the range, so that
lower_bound
andupper_bound
return the same iterator, and otherwiselower_bound
returns an iterator referring to the first element in the searched range that's equivalent to the search value whileupper_bound
returns an iterator referring to the element (if any) that's directly behind the last such element.If the array will always be
Then the index before the one you find will always be false unless you find true at
index 0
. Another boundary case, as mentioned in my comment above, is if you don't findtrue
. Then, the highest false will be the last part of the array.