How do you subset a data frame in R based on a min

Let's say you have a data frame with two levels of factors that looks like this:

Factor1    Factor2    Value
A          1          0.75
A          1          0.34
A          2          1.21   
A          2          0.75 
A          2          0.53
B          1          0.42
B          2          0.21  
B          2          0.18
B          2          1.42

etc.

How do I subset this data frame ("df", if you will) based on the condition that the combination of Factor1 and Factor2 (Fact1*Fact2) has more than, say, 2 observations? Can you use the length argument in subset to do this?

标签： r subset

3条回答

祖国的老花朵

2楼-- · 2019-07-01 19:33

You can use interaction and table to see the number of observation for each interaction (mydata is your data) and then use %in% to subset the data.

 mydata$inter<-with(mydata,interaction(Factor1,Factor2))
 table(mydata$inter)
A.1 B.1 A.2 B.2 
  2   1   3   3 

mydata[!mydata$inter %in% c("A.1","B.1"), ]
  Factor1 Factor2 Value inter
3       A       2  1.21   A.2
4       A       2  0.75   A.2
5       A       2  0.53   A.2
7       B       2  0.21   B.2
8       B       2  0.18   B.2
9       B       2  1.42   B.2

Updated as per @Ananda's comment:You can use following one line code after creating the interaction variable.

mydata[mydata$inter %in% names(which(table(mydata$inter) > 2)), ]

0人赞添加讨论(0) 举报

来，给爷笑一个

3楼-- · 2019-07-01 19:53

Assuming your data.frame is called mydf, you can use ave to create a logical vector to help subset:

mydf[with(mydf, as.logical(ave(Factor1, Factor1, Factor2, 
                           FUN = function(x) length(x) > 2))), ]
#   Factor1 Factor2 Value
# 3       A       2  1.21
# 4       A       2  0.75
# 5       A       2  0.53
# 7       B       2  0.21
# 8       B       2  0.18
# 9       B       2  1.42

Here's ave counting up your combinations. Notice that ave returns an object the same length as the number of rows in your data.frame (this makes it convenient for subsetting).

> with(mydf, ave(Factor1, Factor1, Factor2, FUN = length))
[1] "2" "2" "3" "3" "3" "1" "3" "3" "3"

The next step is to compare that length to your threshold. For that we need an anonymous function for our FUN argument.

> with(mydf, ave(Factor1, Factor1, Factor2, FUN = function(x) length(x) > 2))
[1] "FALSE" "FALSE" "TRUE"  "TRUE"  "TRUE"  "FALSE" "TRUE"  "TRUE"  "TRUE"

Almost there... but since the first item was a character vector, our output is also a character vector. We want it as.logical so we can directly use it for subsetting.

ave doesn't work on objects of class factor, in which case you'll need to do something like:

mydf[with(mydf, as.logical(ave(as.character(Factor1), Factor1, Factor2, 
                               FUN = function(x) length(x) > 2))),]

0人赞添加讨论(0) 举报

forever°为你锁心

4楼-- · 2019-07-01 19:55

library(data.table)

dt = data.table(your_df)

dt[, if(.N > 2) .SD, list(Factor1, Factor2)]
#   Factor1 Factor2 Value
#1:       A       2  1.21
#2:       A       2  0.75
#3:       A       2  0.53
#4:       B       2  0.21
#5:       B       2  0.18
#6:       B       2  1.42

0人赞添加讨论(0) 举报

How do you subset a data frame in R based on a min

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间