I have a data frame as follows:
planets type diameter rotation rings
Mercury Terrestrial planet 0.382 58.64 FALSE
Venus Terrestrial planet 0.949 -243.02 FALSE
Earth Terrestrial planet 1.000 1.00 FALSE
Mars Terrestrial planet 0.532 1.03 FALSE
Jupiter Gas giant 11.209 0.41 TRUE
Saturn Gas giant 9.449 0.43 TRUE
Uranus Gas giant 4.007 -0.72 TRUE
Neptune Gas giant 3.883 0.67 TRUE
I wanted to select last 3 rows:
planets_df[nrow(planets_df)-3:nrow(planets_df),]
However, I've got something I didn't expect:
planets type diameter rotation rings
Jupiter Gas giant 11.209 0.41 TRUE
Mars Terrestrial planet 0.532 1.03 FALSE
Earth Terrestrial planet 1.000 1.00 FALSE
Venus Terrestrial planet 0.949 -243.02 FALSE
Mercury Terrestrial planet 0.382 58.64 FALSE
With trial and error method, I've learned that
> (nrow(planets_df)-3):nrow(planets_df)
[1] 5 6 7 8
and
> nrow(planets_df)-3:nrow(planets_df)
[1] 5 4 3 2 1 0
How does exactly R evaluate :
statement (with reference to brackets)?
The colon operator will take precedence over the arithmetic operations. It is always best to experiment with examples to internalize the logic:
2*2:6-1
What answer should we expect? Some would say 4 5
. The thinking is that it will simplify to 2*2=4
and 6-1=5
, therefore 4:5
.
2*2:6-1
[1] 3 5 7 9 11
This answer will surprise anyone who hasn't considered the order of operations in play. The expression 2*2:6-1
is simplified differently. The sequence 2:6
is carried out first, then the multiplication, and finally the addition. We could write it out as 2 * (2 3 4 5 6)
, which is 4 6 8 10 12
and subtract 1
from that to get 3 5 7 9 11
.
By grouping with parantheses we can control the order of operations as we would do similarly in basic arithmetic to get the answer that we first expected.
(2*2):(6-1)
[1] 4 5
You can apply this reasoning to your example to investigate the seemingly odd behavior of the :
operator.
Now that you know the secret codes, what should we expect from (2*2):6-1
?
The colon :
separates the starting point from the end point of a sequence. It is treated with higher priority than the +
or -
operator.
Therefore,
nrow(planets_df)-3:nrow(planets_df)
is equal to
nrow(planets_df) - (3:nrow(planets_df))
If you want to have the last three entries using this syntax, you need to put the entire expression that defines the start of the sequence into brackets:
planets_df[(nrow(planets_df)-3):nrow(planets_df),]
nrow(planets_df)-3:nrow(planets_df)
is being evaluated as 8 - (3:8) or
(8-3) (8-4) (8-5) (8-6) (8-7) (8-8) = 5 4 3 2 1 0
For future reference if you want the last few rows, use tail(planets_df, 3)