I would like to aggregate a table (tab) by two columns (sequence and program) to get the top row of samplesize (FUN=head).
sq <- c(1,1,1,1,1,1)
prog<- c('A','A','B','B','C','C')
ss <- c(47,47,28,28,47,47)
tab<- data.frame(sq,prog,ss)
Aggregate is giving me an odd result in that if the sample size is the same for a DIFFERENT combination of sequence and program- it omits it.
agg <- aggregate(cbind(sq,prog) ~ ss, data = tab, FUN=head,1,na.rm=TRUE)
I'm confused why this is occurring and why it is changing the program to a numerical sequence when it is text (A,B,C).
It's because by default,
data.frame
creates a factor from character columns. You need:EDIT: I personally find the
dplyr
package very intuitive. For your result, I'd use: