subset() a factor by its number of observation

2019-01-19 07:33发布


I have a problem with subset()function. How can I subset a factor of my dataframe by its number of observation?

   NAME      CLASS         COLOR   VALUE      
   antonio       B          YELLOW       5
   antonio       B          BLUE       8
   antonio       B          BLUE       7 
   antonio       B          BLUE      12 
   luca          C          YELLOW    99
   luca          B          YELLOW    87
   luca          B          YELLOW    98
   giovanni      A          BLUE      48

I would like to obtain data where the three factors "NAME","CLASS" and "COLOR" compare at least three times in order to make a mean of VALUE. in this case I'll obtain:

   NAME      CLASS         COLOR   VALUE      
   antonio       B          BLUE       mean

because antonio is the only with three observations for each factor

thank you so much



You can use the table function as follows:

subset(df, table(FACTOR)[FACTOR] >= 3)
# 1 ANTONIO     5
# 2 ANTONIO     8
# 3 ANTONIO     7

To help you understand, see what these return:

table(df$FACTOR)[df$FACTOR] >= 3

You could also use the ave function to compute the number of observations:

subset(df, ave(VALUE, FACTOR, FUN = length) >= 3)

This last method may be a little more flexible if you have multiple factors like you asked in your comment and updated question. You can do:

subset(df, ave(VALUE, NAME, CLASS, COLOR, FUN = length) >= 3)