Let's say you have a data frame with two levels of factors that looks like this:
Factor1 Factor2 Value
A 1 0.75
A 1 0.34
A 2 1.21
A 2 0.75
A 2 0.53
B 1 0.42
B 2 0.21
B 2 0.18
B 2 1.42
etc.
How do I subset
this data frame ("df", if you will) based on the condition that the combination of Factor1 and Factor2 (Fact1*Fact2) has more than, say, 2 observations? Can you use the length
argument in subset
to do this?
You can use
interaction
andtable
to see the number of observation for each interaction (mydata is your data) and then use%in%
to subset the data.Updated as per @Ananda's comment:You can use following one line code after creating the interaction variable.
Assuming your
data.frame
is calledmydf
, you can useave
to create a logical vector to help subset:Here's
ave
counting up your combinations. Notice thatave
returns an object the same length as the number of rows in yourdata.frame
(this makes it convenient for subsetting).The next step is to compare that length to your threshold. For that we need an anonymous function for our
FUN
argument.Almost there... but since the first item was a character vector, our output is also a character vector. We want it
as.logical
so we can directly use it for subsetting.ave
doesn't work on objects of classfactor
, in which case you'll need to do something like: