I have two vectors:
x <- c(1,1,1,1,1, 2,2,2,3,3, 3,3,3,4,4, 5,5,5,5,5 )
y <- c(2,2,1,3,2, 1,4,2,2,NA, 3,3,3,4,NA, 1,4,4,2,NA)
This question (Conditional calculating the numbers of values in column with R, part2) discussed how to find the number of values in w
(don't count NA
) for each x
(from 1–5) and for each y
(from 1–4).
Let's split X
by groups: if x<=2
, group I
; if 2<x<=3
, group II
; and if 3<X<=5
, group III
. I need to find the number of different values in x
by groups and by every value of y
. I also need to find the mean of those values in x
by the same groups. The output should be in this format:
y x Result 1 (the number of distinct numbers in X); Result 2 (the mean)
1 I ...
1 II ...
1 III ...
...
4 I ...
4 II ...
4 III ...
My command of R code isn't great, so here's A Rather Ugly Function:
ARUF=function(x,y){df1=data.frame(x,y,group=NA);miny=min(y,na.rm=T)
maxy=max(y,na.rm=T);for(i in 1:length(df1$x))df1$group[i]=if(df1$x[i]<=2)'I'else
if(df1$x[i]>2&df1$x[i]<=3)'II'else if(df1$x[i]>3&df1$x[i]<=5)'III'else'NA'
Result1=c();Result2=c();for(i in miny:maxy){for(j in c('I','II','III')){
Result1=append(Result1,length(levels(factor(subset(df1,y==i&group==j)$x))))
Result2=append(Result2,mean(subset(df1,y==i&group==j)$x))}}
print(data.frame(y=rep(miny:maxy,rep(3,maxy+abs(miny-1))),
x=rep(c('I','II','III'),maxy+abs(miny-1)),Result1,Result2),row.names=F)}
With your x
and y
, ARUF(x,y)
prints this data.frame
:
y x Result1 Result2
1 I 2 1.500000
1 II 0 NaN
1 III 1 5.000000
2 I 2 1.250000
2 II 1 3.000000
2 III 1 5.000000
3 I 1 1.000000
3 II 1 3.000000
3 III 0 NaN
4 I 1 2.000000
4 II 0 NaN
4 III 2 4.666667
I went a little out of my way to make ARUF
robust with any integer values of y
. I can't seem to break it by generating y
randomly with rbinom
, and I believe it should handle any real number values of x
, so it should work for any other vectors of the same kind that you might have.
#Bring in data.table library
require(data.table)
data <- data.table(x,y)
#Summarize data
data[, list(x = mean(x, na.rm=TRUE)), by =
list(y, x.grp = cut(x, c(-Inf,2,3,5,Inf)))][order(y,x.grp)]
If you'd like the results to be NA
when NA
s are present, then just remove na.rm=TRUE
from mean(.)
:
data[, list(x = mean(x)), by =
list(y, x.grp = cut(x, c(-Inf,2,3,5,Inf)))][order(y,x.grp)]