In R, how can I access the first element of each l

I have a data frame like this:

n = c(2, 2, 3, 3, 4, 4) 
n <- as.factor(n)
s = c("a", "b", "c", "d", "e", "f") 
df = data.frame(n, s)  

df
  n s
1 2 a
2 2 b
3 3 c
4 3 d
5 4 e
6 4 f

and I want to access the first element of each level of my factor (and have in this example a vector containing a, c, e).

It is possible to reach the first element of one level, with

df$s[df$n == 2][1]

but it does not work for all levels:

df$s[df$n == levels(n)]
[1] a f

How would you do that?

And to go further, I’d like to modify my data frame to see which is the first element for each level at every occurrence. In my example, a new column should be:

  n s rep firstelement
1 2 a   a            a
2 2 b   c            a
3 3 c   e            c
4 3 d   a            c
5 4 e   c            e
6 4 f   e            e

标签： r r-factor

7条回答

家丑人穷心不美

2楼-- · 2019-06-14 22:41

Surprised not to see this classic in the answer stream yet.

> do.call(rbind, lapply(split(df, df$n), function(x) x[1,]))
##   n s
## 2 2 a
## 3 3 c
## 4 4 e

0人赞添加讨论(0) 举报

The star\"

3楼-- · 2019-06-14 22:42

You could also use data.table

library(data.table)
dt = as.data.table(df)
dt[, list(firstelement = s[1]), by=n]

which would get you:

   n firstelement
1: 2            a
2: 3            c
3: 4            e

The by=n bit groups everything by each value of n so s[1] is getting the first element of each of those groups.

To get this as an extra column you could do:

dt[, newcol := s[1], by=n]
dt
#   n s newcol
#1: 2 a      a
#2: 2 b      a
#3: 3 c      c
#4: 3 d      c
#5: 4 e      e
#6: 4 f      e

So this just takes the value of s from the first row of each group and assigns it to a new column.

0人赞添加讨论(0) 举报

Juvenile、少年°

4楼-- · 2019-06-14 22:54

Here is an approach using match:

 df$s[match(levels(n), df$n)]

EDIT: Maybe this looks a bit confusing ...

To get a column which lists the first elements you could use match twice (but with x and table arguments swapped):

 df$firstelement <- df$s[match(levels(n), df$n)[match(df$n, levels(n))]]
 df$firstelement
 # [1] a a c c e e
 # Levels: a b c d e f

Lets look at this in detail:

 ## this returns the first matching elements
 match(levels(n), df$n)
 # [1] 1 3 5

 ## when we swap the x and table argument in match we get the level index
 ## for each df$n (the duplicated indices are important)
 match(df$n, levels(n))
 # [1] 1 1 2 2 3 3

 ## results in
 c(1, 3, 5)[c(1, 1, 2, 2, 3, 3)]
 # [1] 1 1 3 3 5 5
 df$s[c(1, 1, 3, 3, 5, 5)]
 # [1] a a c c e e
 # Levels: a b c d e f

0人赞添加讨论(0) 举报

smile是对你的礼貌

5楼-- · 2019-06-14 23:00

the function ave is useful in these cases:

df$firstelement = ave(df$s, df$n, FUN = function(x) x[1])
df
  n s firstelement
1 2 a            a
2 2 b            a
3 3 c            c
4 3 d            c
5 4 e            e
6 4 f            e

0人赞添加讨论(0) 举报

唯我独甜

6楼-- · 2019-06-14 23:00

df$s[sapply(levels(n), function(particular.level) { which(df$n == particular.level)[1]})]

I believe your problem is that you are comparing two vectors df$n is a vector and levels(n) is a vector. vector == vector only happens to work for you since df$n is a multiple length of levels(n)

0人赞添加讨论(0) 举报

劳资没心，怎么记你

7楼-- · 2019-06-14 23:01

Edit. The first part of my answer addresses the original question, i.e. before "And to go further" (which was added by OP in an edit).

Another possibility, using duplicated. From ?duplicated: "duplicated() determines which elements of a vector or data frame are duplicates of elements with smaller subscripts."

Here we use !, the logical negation (NOT), to select not duplicated elements of 'n', i.e. first elements of each level of 'n'.

df[!duplicated(df$n), ]
#   n s
# 1 2 a
# 3 3 c
# 5 4 e

Update Didn't see your "And to go further" edit until now. My first suggestion would definitely be to use ave, as already proposed by @thelatemail and @sparrow. But just to dig around in the R toolbox and show you an alternative, here's a dplyr way:

Group the data by n, use the mutate function to create a new variable 'first', with the value 'first element of s' (s[1]),

library(dplyr)

df %.%
  group_by(n) %.%
  mutate(
    first = s[1])
#   n s first
# 1 2 a     a
# 2 2 b     a
# 3 3 c     c
# 4 3 d     c
# 5 4 e     e
# 6 4 f     e

Or go all in with dplyr convenience functions and use first instead of [1]:

df %.%
  group_by(n) %.%
  mutate(
    first = first(s))

A dplyr solution for your original question would be to use summarise:

df %.%
  group_by(n) %.%
  summarise(
    first = first(s))

#   n first
# 1 2     a
# 2 3     c
# 3 4     e

0人赞添加讨论(0) 举报

1 2 下一页

In R, how can I access the first element of each l

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间