So in Java, whenever an indexed range is given, the upper bound is almost always exclusive.
From java.lang.String
:
substring(int beginIndex, int endIndex)
Returns a new string that is a substring of this string. The substring begins at the specified beginIndex
and extends to the character at index endIndex - 1
From java.util.Arrays
:
copyOfRange(T[] original, int from, int to)
from
- the initial index of the range to be copied, inclusive
to
- the final index of the range to be copied, exclusive.
From java.util.BitSet
:
set(int fromIndex, int toIndex)
fromIndex
- index of the first bit to be set.
toIndex
- index after the last bit to be set.
As you can see, it does look like Java tries to make it a consistent convention that upper bounds are exclusive.
My questions are:
- Is this the official authoritative recommendation?
- Are there notable violations that we should be wary of?
- Is there a name for this system? (ala "0-based" vs "1-based")
CLARIFICATION: I fully understand that a collection of N
objects in a 0-based system is indexed 0..N-1
. My question is that if a range (2,4)
given, it can be either 3 items or 2, depending on the system. What do you call these systems?
AGAIN, the issue is not "first index 0
last index N-1
" vs "first index 1
last index N
" system; that's known as the 0-based vs 1-based system.
The issue is "There are 3 elements in (2,4)
" vs "There are 2 elements in (2,4)
" systems. What do you call these, and is one officially sanctioned over the other?
In general, yes. If you are working in a language with C-like syntax (C, C++, Java), then arrays are zero-indexed, and most random access data structures (vectors, array-lists, etc.) are going to be zero-indexed as well.
Starting indices at zero means that the size of the data structure is always going to be one greater than last valid index in the data structure. People often want to know the size of things, of course, and so it's more convenient to talk about the size than to talk about the the last valid index. People get accustomed to talking about ending indices in an exclusive fashion, because an array a[]
that is n
elements long has its last valid element in a[n-1]
.
There is another advantage to using an exclusive index for the ending index, which is that you can compute the size of a sublist by subtracting the inclusive beginning index from the exclusive ending index. If I call myList.sublist(3, 7)
, then I get a sublist with 7 - 3 = 4
elements in it. If the sublist()
method had used inclusive indices for both ends of the list, then I would need to add an extra 1 to compute the size of the sublist.
This is particularly handy when the starting index is a variable: Getting the sublist of myList
starting at i
that is 5 elements long is just myList.sublist(i, i + 5)
.
All of that being said, you should always read the API documentation, rather than assuming that a given beginning index or ending index will be inclusive or exclusive. Likewise, you should document your own code to indicate if any bounds are inclusive or exclusive.
Its just 0 to n-1 based.
A list/Array contains 10 items 0-9 indexed.
You cannot have a 0 indexed based list that is 0-n where the cout is n, that includes an item that does not exists...
This is the typical way things work.
- Yes.
- Excel Ranges/Sheets/Workbooks.
- Index (information technology)
Credit goes to FredOverflow in his comment saying that this is called the "half-open range". So presumably, Java Collections can be described as "0-based with half-open ranges".
I've compiled some discussions about half-open vs closed ranges elsewhere:
siliconbrain.com - 16 good reasons to use half-open ranges (edited for conciseness):
- The number of elements in the range
[n, m)
is just m-n
(and not m-n+1
).
- The empty range is
[n, n)
(and not [n, n-1]
, which can be a problem if n
is an iterator already pointing the first element of a list, or if n == 0
).
- For floats you can write
[13, 42)
(instead of [13, 41.999999999999]
).
- The
+1
and -1
are almost never used, when handling ranges. This is an advantage if they are expensive (as it is for dates).
- If you write a find in a range, the fact that there was nothing found can easily indicated by returning the end as the found position:
if( find( [begin, end) ) == end)
nothing found.
- In languages, which start the array subscripts with 0 (like C, C++, JAVA, NCL) the upper bound is equal to the size.
Half-open versus closed ranges
Advantages of half-open ranges:
- Empty ranges are valid:
[0 .. 0]
- Easy for subranges to go to the end of the original:
[x .. $]
- Easy to split ranges:
[0 .. x]
and [x .. $]
Advantages of closed ranges:
- Symmetry.
- Arguably easier to read.
['a' ... 'z']
does not require awkward + 1
after 'z'
.
[0 ... uint.max]
is possible.
That last point is very interesting. It's really awkward to write an numberIsInRange(int n, int min, int max)
predicate with a half-open range if Integer.MAX_VALUE
could be legally in a range.
This practice was introduced by Josh Bloch to Collections API as a contract.
After that it became a standard in java and when anybody dicide to create a public library he assumes that he should keep the contract because users expect to see already known behavior in new libraries.
The indexes in array like datastructures are indeed always 0-based. The String
is basically backed by a char[]
. The Collections framework is under the hood based on arrays and so on. This makes designing/maintaining/using the API easier without changing the "under-the-hood" way to access the desired element(s) in the array.
There are however some "exceptions", such as the parameterindex-based setter methods of PreparedStatement
and the columnindex-based getter methods of ResultSet
. They are 1-based. Behind the scenes they does also not really represent an array of values.
This would probably bring up a new question: "Why are array indexes zero based?". Now, our respected computer programming scientist E.W. Dijkstra explains here why it should start with zero.
The easy way to think of half-open ranges is this: the first term identifies the start of elements within the range, and the second term identifies the start of elements after the range. Keep that in mind, and it all makes a good deal more sense. Plus the arithmetic works out better in many cases, per @polygenelubricants' answer.