How can I combine rows within the same data frame

Sample of 2 (made-up) example rows in df:

userid   facultyid  courseid schoolid
167       265        NA       1678  
167       71111      301      NA

Suppose that I have a couple hundred duplicate userid like in the above example. However, the vast majority of userid have different values.

How can I combine rows with duplicate userid in such a way as to stick to the column values in the 1st (of the 2) row unless the first value is NA (in which case the NA will be repopulated with whatever value came from the second row)?

In essence, drawing from the above example, my ideal output would contain:

userid   facultyid  courseid schoolid
167       265        301       1678

标签： r duplicates append combiners

4条回答

走好不送

2楼-- · 2019-02-15 15:36

# initialize a vector that will contain row numbers which should be erased
rows.to.erase <- c()

# loop over the rows, starting from top
for(i in 1:(nrow(dat)-1)) {
  if(dat$userid[i] == dat$userid[i+1]) {
    # loop over columns to recuperate data when a NA is present
    for(j in 2:4) {
      if(is.na(dat[i,j]))
        dat[i,j] <- dat[i+1,j]
    }
    rows.to.erase <- append(rows.to.erase, i+1)
  }
}

dat.clean <- dat[-rows.to.erase,]
dat.clean
#   userid facultyid courseid schoolid
# 1    167       265      301     1678

0人赞添加讨论(0) 举报

趁早两清

3楼-- · 2019-02-15 15:42

Here's a different approach using ddply :

# requires the plyr package
library(plyr)

# Your example dataframe with added lines
schoolex <- data.frame(userid = c(167, 167, 200, 203, 203), facultyid = c(265, 71111, 200, 300, NA), 
                        courseid = c(NA, 301, 302, 303, 303), schoolid = c(1678, NA, 1678, NA, 1678))

schoolex_duprm <- ddply(schoolex, .(userid), summarize, facultyid2 = facultyid[!is.na(facultyid)][1], 
                               courseid2 = courseid[!is.na(courseid)][1], 
                               schoolid2 = schoolid[!is.na(schoolid)][1])

0人赞添加讨论(0) 举报

等我变得足够好

4楼-- · 2019-02-15 15:42

Here's a simple one-liner from plyr. I wrote it a bit more generally than you asked:

 a <- data.frame(x=c(1,2,3,1,2,3,1,2,3),y=c(2,3,1,1,2,3,2,3,1),
       z=c(NA,1,NA,2,NA,3,4,NA,5),zz=c(1,NA,2,NA,3,NA,4,NA,5))

 ddply(a,~x+y,summarize,z=first(z[!is.na(z)]),zz=first(zz[!is.na(zz)]))

Specifically answering the original question, if your data frame is named a, :

 ddply(a,~userid,summarize,facultyid=first(facultyid[!is.na(facultyid)]),
         courseid=first(courseid[!is.na(courseid)],
         schoolid=first(schoolid[!is.na(schoolid)])

0人赞添加讨论(0) 举报

叛逆

5楼-- · 2019-02-15 15:44

aggregate(x = df1, by = list(df1$userid), FUN = function(x) na.omit(x)[1])[,-1]

or use dplyr library:

library(dplyr)

df1 %>%
  group_by(userid) %>%
  summarise_each(funs(first(na.omit(.))))

0人赞添加讨论(0) 举报

How can I combine rows within the same data frame

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间