Merge data.frames with duplicates

I have many data.frames, for example:

df1 = data.frame(names=c('a','b','c','c','d'),data1=c(1,2,3,4,5))
df2 = data.frame(names=c('a','e','e','c','c','d'),data2=c(1,2,3,4,5,6))
df3 = data.frame(names=c('c','e'),data3=c(1,2))

and I need to merge these data.frames, without delete the name duplicates

> result
  names data1 data2 data3
1  'a'    1    1      NA
2  'b'    2    NA     NA
3  'c'    3    4      1
4  'c'    4    5      NA
5  'd'    5    6      NA
6  'e'    NA   2      2       
7  'e'    NA   3      NA

I cant find function like merge with option to handle with name duplicates. Thank you for your help. To define my problem. The data comes from biological experiment where one sample have a different number of replicates. I need to merge all experiment, and I need to produce this table. I can't generate unique identifier for replicates.

标签： r merge duplicates

3条回答

该账号已被封号

2楼-- · 2020-02-14 05:49

See other questions:

Examples:

library(reshape)
out <- merge_recurse(L)

library(plyr)

out<-join(df1, df2, type="full")
out<-join(out, df3, type="full")
*can be looped

library(plyr)
out<-ldply(L)

0人赞添加讨论(0) 举报

叼着烟拽天下

3楼-- · 2020-02-14 05:55

First define a function, run.seq, which provides sequence numbers for duplicates since it appears from the output that what is desired is that the ith duplicate of each name in each component of the merge be associated. Then create a list of the data frames and add a run.seq column to each component. Finally use Reduce to merge them all.

run.seq <- function(x) as.numeric(ave(paste(x), x, FUN = seq_along))

L <- list(df1, df2, df3)
L2 <- lapply(L, function(x) cbind(x, run.seq = run.seq(x$names)))

out <- Reduce(function(...) merge(..., all = TRUE), L2)[-2]

The last line gives:

> out
  names data1 data2 data3
1     a     1     1    NA
2     b     2    NA    NA
3     c     3     4     1
4     c     4     5    NA
5     d     5     6    NA
6     e    NA     2     2
7     e    NA     3    NA

EDIT: Revised run.seq so that input need not be sorted.

0人赞添加讨论(0) 举报

不美不萌又怎样

4楼-- · 2020-02-14 06:06

I think there is just not enough information in your example data frames to do this. Which 'c' in dataframe 1 should be paired with which 'c' in data frame 2? We cannot tell, so R can't either. I suspect you will have to add another variable to each of your dataframes that uniquely identifies these duplicate cases.

0人赞添加讨论(0) 举报

Merge data.frames with duplicates

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间