可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Let's say you have a data frame with two levels of factors that looks like this:

Factor1    Factor2    Value
A          1          0.75
A          1          0.34
A          2          1.21   
A          2          0.75 
A          2          0.53
B          1          0.42
B          2          0.21  
B          2          0.18
B          2          1.42

etc.

How do I subset this data frame ("df", if you will) based on the condition that the combination of Factor1 and Factor2 (Fact1*Fact2) has more than, say, 2 observations? Can you use the length argument in subset to do this?

回答1:

Assuming your data.frame is called mydf, you can use ave to create a logical vector to help subset:

mydf[with(mydf, as.logical(ave(Factor1, Factor1, Factor2, 
                           FUN = function(x) length(x) > 2))), ]
#   Factor1 Factor2 Value
# 3       A       2  1.21
# 4       A       2  0.75
# 5       A       2  0.53
# 7       B       2  0.21
# 8       B       2  0.18
# 9       B       2  1.42

Here's ave counting up your combinations. Notice that ave returns an object the same length as the number of rows in your data.frame (this makes it convenient for subsetting).

> with(mydf, ave(Factor1, Factor1, Factor2, FUN = length))
[1] "2" "2" "3" "3" "3" "1" "3" "3" "3"

The next step is to compare that length to your threshold. For that we need an anonymous function for our FUN argument.

> with(mydf, ave(Factor1, Factor1, Factor2, FUN = function(x) length(x) > 2))
[1] "FALSE" "FALSE" "TRUE"  "TRUE"  "TRUE"  "FALSE" "TRUE"  "TRUE"  "TRUE"

Almost there... but since the first item was a character vector, our output is also a character vector. We want it as.logical so we can directly use it for subsetting.

ave doesn't work on objects of class factor, in which case you'll need to do something like:

mydf[with(mydf, as.logical(ave(as.character(Factor1), Factor1, Factor2, 
                               FUN = function(x) length(x) > 2))),]

回答2:

library(data.table)

dt = data.table(your_df)

dt[, if(.N > 2) .SD, list(Factor1, Factor2)]
#   Factor1 Factor2 Value
#1:       A       2  1.21
#2:       A       2  0.75
#3:       A       2  0.53
#4:       B       2  0.21
#5:       B       2  0.18
#6:       B       2  1.42

回答3:

You can use interaction and table to see the number of observation for each interaction (mydata is your data) and then use %in% to subset the data.

 mydata$inter<-with(mydata,interaction(Factor1,Factor2))
 table(mydata$inter)
A.1 B.1 A.2 B.2 
  2   1   3   3 

mydata[!mydata$inter %in% c("A.1","B.1"), ]
  Factor1 Factor2 Value inter
3       A       2  1.21   A.2
4       A       2  0.75   A.2
5       A       2  0.53   A.2
7       B       2  0.21   B.2
8       B       2  0.18   B.2
9       B       2  1.42   B.2

Updated as per @Ananda's comment:You can use following one line code after creating the interaction variable.

mydata[mydata$inter %in% names(which(table(mydata$inter) > 2)), ]

How do you subset a data frame in R based on a min

问题:

回答1:

回答2:

回答3:

收藏的人(0)

How do you subset a data frame in R based on a min

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮