R - find all unique values among subsets of a data

I have a data frame with two columns. The first column defines subsets of the data. I want to find all values in the second column that only appear in one subset in the first column.

For example, from:

df=data.frame(
  data_subsets=rep(LETTERS[1:2],each=5),
  data_values=c(1,2,3,4,5,2,3,4,6,7))

data_subsets data_values
      A           1
      A           2
      A           3
      A           4
      A           5
      B           2
      B           3
      B           4
      B           6
      B           7

I would want to extract the following data frame.

data_subsets   data_values
    A              1
    A              5
    B              6
    B              7

I have been playing around with duplicated but I just can't seem to make it work. Any help is appreciated. There are a number of topics tackling similar problems, I hope I didn't overlook the answer in my searches!

EDIT

I modified the approach from @Matthew Lundberg of counting the number of elements and extracting from the data frame. For some reason his approach was not working with the data frame I had, so I came up with this, which is less elegant but gets the job done:

counts=rowSums(do.call("rbind",tapply(df$data_subsets,df$data_values,FUN=table)))
extract=names(counts)[counts==1]
df[match(extract,df$data_values),]

标签： r duplicates unique

3条回答

Animai°情兽

2楼-- · 2019-02-26 09:28

First, find the count of each element in df$data_values:

 x <- sapply(df$data_values, function(x) sum(as.numeric(df$data_values == x)))

> x
 [1] 1 2 2 2 1 2 2 2 1 1

Now extract the rows:

> df[x==1,]
   data_subsets data_values
1             A           1
5             A           5
9             B           6
10            B           7

Note that you missed "A 5" above. There is no "B 5".

0人赞添加讨论(0) 举报

smile是对你的礼貌

3楼-- · 2019-02-26 09:33

You had the right idea with duplicated. The trick is to combine fromLast = TRUE and fromLast = FALSE options to get a full list of non-duplicated rows.

!duplicated(df$data_values,fromLast = FALSE)&!duplicated(df$data_values,fromLast = TRUE)
 [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE

Indexing your data.frame with this vector gives:

df[!duplicated(df$data_values,fromLast = FALSE)&!duplicated(df$data_values,fromLast = TRUE),]
   data_subsets data_values
1             A           1
5             A           5
9             B           6
10            B           7

0人赞添加讨论(0) 举报

我命由我不由天

4楼-- · 2019-02-26 09:36

A variant of P Lapointe's answer would be

df[! df$data_values %in% df[duplicated( unique(df)$data_values ), ]$data_values,]

The unique() deals with the possibility (not in your test data) that some rows in the data may be identical and you want to keep them once if the same data_values does not appear for distinct data_sets (or distinct other columns).

0人赞添加讨论(0) 举报

R - find all unique values among subsets of a data

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间