I have a data frame as follows:
planets type diameter rotation rings
Mercury Terrestrial planet 0.382 58.64 FALSE
Venus Terrestrial planet 0.949 -243.02 FALSE
Earth Terrestrial planet 1.000 1.00 FALSE
Mars Terrestrial planet 0.532 1.03 FALSE
Jupiter Gas giant 11.209 0.41 TRUE
Saturn Gas giant 9.449 0.43 TRUE
Uranus Gas giant 4.007 -0.72 TRUE
Neptune Gas giant 3.883 0.67 TRUE
I wanted to select last 3 rows:
planets_df[nrow(planets_df)-3:nrow(planets_df),]
However, I've got something I didn't expect:
planets type diameter rotation rings
Jupiter Gas giant 11.209 0.41 TRUE
Mars Terrestrial planet 0.532 1.03 FALSE
Earth Terrestrial planet 1.000 1.00 FALSE
Venus Terrestrial planet 0.949 -243.02 FALSE
Mercury Terrestrial planet 0.382 58.64 FALSE
With trial and error method, I've learned that
> (nrow(planets_df)-3):nrow(planets_df)
[1] 5 6 7 8
and
> nrow(planets_df)-3:nrow(planets_df)
[1] 5 4 3 2 1 0
How does exactly R evaluate :
statement (with reference to brackets)?
The colon operator will take precedence over the arithmetic operations. It is always best to experiment with examples to internalize the logic:
What answer should we expect? Some would say
4 5
. The thinking is that it will simplify to2*2=4
and6-1=5
, therefore4:5
.This answer will surprise anyone who hasn't considered the order of operations in play. The expression
2*2:6-1
is simplified differently. The sequence2:6
is carried out first, then the multiplication, and finally the addition. We could write it out as2 * (2 3 4 5 6)
, which is4 6 8 10 12
and subtract1
from that to get3 5 7 9 11
.By grouping with parantheses we can control the order of operations as we would do similarly in basic arithmetic to get the answer that we first expected.
You can apply this reasoning to your example to investigate the seemingly odd behavior of the
:
operator.Now that you know the secret codes, what should we expect from
(2*2):6-1
?nrow(planets_df)-3:nrow(planets_df)
is being evaluated as 8 - (3:8) or(8-3) (8-4) (8-5) (8-6) (8-7) (8-8) = 5 4 3 2 1 0
For future reference if you want the last few rows, use
tail(planets_df, 3)
The colon
:
separates the starting point from the end point of a sequence. It is treated with higher priority than the+
or-
operator. Therefore,is equal to
If you want to have the last three entries using this syntax, you need to put the entire expression that defines the start of the sequence into brackets: