I have a data frame like this:
n = c(2, 2, 3, 3, 4, 4)
n <- as.factor(n)
s = c("a", "b", "c", "d", "e", "f")
df = data.frame(n, s)
df
n s
1 2 a
2 2 b
3 3 c
4 3 d
5 4 e
6 4 f
and I want to access the first element of each level of my factor (and have in this example a vector containing a, c, e
).
It is possible to reach the first element of one level, with
df$s[df$n == 2][1]
but it does not work for all levels:
df$s[df$n == levels(n)]
[1] a f
How would you do that?
And to go further, I’d like to modify my data frame to see which is the first element for each level at every occurrence. In my example, a new column should be:
n s rep firstelement
1 2 a a a
2 2 b c a
3 3 c e c
4 3 d a c
5 4 e c e
6 4 f e e
Surprised not to see this classic in the answer stream yet.
You could also use data.table
which would get you:
The
by=n
bit groups everything by each value ofn
sos[1]
is getting the first element of each of those groups.To get this as an extra column you could do:
So this just takes the value of
s
from the first row of each group and assigns it to a new column.Here is an approach using
match
:EDIT: Maybe this looks a bit confusing ...
To get a column which lists the first elements you could use
match
twice (but withx
andtable
arguments swapped):Lets look at this in detail:
the function
ave
is useful in these cases:I believe your problem is that you are comparing two vectors df$n is a vector and levels(n) is a vector. vector == vector only happens to work for you since df$n is a multiple length of levels(n)
Edit. The first part of my answer addresses the original question, i.e. before "And to go further" (which was added by OP in an edit).
Another possibility, using
duplicated
. From?duplicated
: "duplicated()
determines which elements of a vector or data frame are duplicates of elements with smaller subscripts."Here we use
!
, the logical negation (NOT), to select not duplicated elements of 'n', i.e. first elements of each level of 'n'.Update Didn't see your "And to go further" edit until now. My first suggestion would definitely be to use
ave
, as already proposed by @thelatemail and @sparrow. But just to dig around in the R toolbox and show you an alternative, here's adplyr
way:Group the data by
n
, use themutate
function to create a new variable 'first', with the value 'first element of s' (s[1]
),Or go all in with
dplyr
convenience functions and usefirst
instead of[1]
:A
dplyr
solution for your original question would be to usesummarise
: