Why do vector indices in R start with 1, instead o

2019-01-18 01:51发布

问题:

What is the reason that vector indices in R start with 1, instead of the usual 0?

Example:

> arr<-c(10,20)
> arr[0]
numeric(0)
> arr[1]
[1] 10
> arr[2]
[1] 20

Is it just that they want to store extra information about the vector and didn't know where to store it except as the vector's first element?

回答1:

FORTRAN is one language that starts arrays at 1. Mathematicians deal with vectors that always start with component 1 and go through N. Linear algebra conventions start with row and column numbered 1 and go through N as well.

C started with zero because of the pointer arithmetic that was implicit underneath. Java, JavaScript, C++, and C# followed suit from C.



回答2:

Vectors in math are often represented as n-tuples, elements of which are indexed from 1 to n. I suspect that r wanted to stay true to this notation.



回答3:

Frank, I think you were misinterpreting what you saw when you typed arr[0]. The numeric(0) just means that the result is a numeric vector with no elements. It does not mean that the type of the vector is being "stored" in element 0. You would have gotten the same result if you had typed, for example, arr[arr > 30]. No element meets that condition, so the result vector has no elements. Likewise, no element has index 0. This is intentional, and has nothing to do with the 0 space being used for something else.



回答4:

0 is only "usual" because that's what C did, and a lot of later languages slavishly copied C syntax. By default in Fortran arrays are 1-based.

In Ada there is no default and you have to pick the beginnning and end ranges. Interestingly, it seems that most code I've come across picks '1' for the lower bound. I think that's a pretty good indication of where folks would have gone given a free choice.



回答5:

R is a "platform for experimentation and research". Its aim is to enable "statisticians to use the full capabilities of such an environment" without rethinking the way they usually deal with statistics. So people use formulas to make regression models, and people start counting at 1.



回答6:

Actually, I think that the C like version that "start with 0" is very logical when you look at the way the memory is organized. In C we can write the following :

int* T = new int[10];

The first element of the array is *T. This is perfectly "logical" because *T is the adress of the first memory case pointed. The second element is the second case so *(T+1) : we move forward by one "sizeof(int)".

To make the code more readable, C implemented an alias : T[i] for *(T+i). To access the first element, you have to access *T that is T[0]. That's perfectly natural.

This idea is extended by iterators :

std::vector<int> T(10);
int val = *(T.begin()+3);

T[i] is just an alias for *(T.begin()+i).

In fortran/R, we usually start with 1 because of mathematical issues but there's certainly other good choices (cf this link for example). Do not forget that fortran can easily use array that start with 0 :

PROGRAM ZEROARRAY
REAL T(0:9)
T(0) = 3.14
END


回答7:

You're doing it wrong. If you want to store additional attributes in an object, use attr:

> foo <- 1:20
> attr(foo, "created") <- Sys.time()               # just as an example
> str(foo)
 atomic [1:20] 1 2 3 4 5 6 7 8 9 10 ...
 - attr(*, "created")= POSIXct[1:1], format: "2010-06-28 14:07:15"    # our time
> summary(foo)                                     # object works as usual
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    5.75   10.50   10.50   15.20   20.00 
> 


标签: arrays r vector