Conditional calculating the numbers of values in c

I have two vectors:

x <- c(1,1,1,1,1, 2,2,2,3,3,  3,3,3,4,4,  5,5,5,5,5 )
y <- c(2,2,1,3,2, 1,4,2,2,NA, 3,3,3,4,NA, 1,4,4,2,NA)

This question (Conditional calculating the numbers of values in column with R, part2) discussed how to find the number of values in w (don't count NA) for each x (from 1–5) and for each y (from 1–4).

Let's split X by groups: if x<=2, group I; if 2<x<=3, group II; and if 3<X<=5, group III. I need to find the number of different values in x by groups and by every value of y. I also need to find the mean of those values in x by the same groups. The output should be in this format:

y x    Result 1 (the number of distinct numbers in X); Result 2 (the mean)
1 I     ...
1 II    ...
1 III   ...     
...
4 I     ...
4 II    ...
4 III   ...

标签： r aggregation

2条回答

Viruses.

2楼-- · 2019-02-26 08:58

#Bring in data.table library
require(data.table)
data <- data.table(x,y)

#Summarize data
data[, list(x = mean(x, na.rm=TRUE)), by = 
       list(y, x.grp = cut(x, c(-Inf,2,3,5,Inf)))][order(y,x.grp)]

If you'd like the results to be NA when NAs are present, then just remove na.rm=TRUE from mean(.):

data[, list(x = mean(x)), by = 
       list(y, x.grp = cut(x, c(-Inf,2,3,5,Inf)))][order(y,x.grp)]

0人赞添加讨论(0) 举报

淡お忘

3楼-- · 2019-02-26 09:11

My command of R code isn't great, so here's A Rather Ugly Function:

ARUF=function(x,y){df1=data.frame(x,y,group=NA);miny=min(y,na.rm=T)
maxy=max(y,na.rm=T);for(i in 1:length(df1$x))df1$group[i]=if(df1$x[i]<=2)'I'else
if(df1$x[i]>2&df1$x[i]<=3)'II'else if(df1$x[i]>3&df1$x[i]<=5)'III'else'NA'
Result1=c();Result2=c();for(i in miny:maxy){for(j in c('I','II','III')){
Result1=append(Result1,length(levels(factor(subset(df1,y==i&group==j)$x))))
Result2=append(Result2,mean(subset(df1,y==i&group==j)$x))}}
print(data.frame(y=rep(miny:maxy,rep(3,maxy+abs(miny-1))),
x=rep(c('I','II','III'),maxy+abs(miny-1)),Result1,Result2),row.names=F)}

With your x and y, ARUF(x,y) prints this data.frame:

y   x Result1  Result2
1   I       2 1.500000
1  II       0      NaN
1 III       1 5.000000
2   I       2 1.250000
2  II       1 3.000000
2 III       1 5.000000
3   I       1 1.000000
3  II       1 3.000000
3 III       0      NaN
4   I       1 2.000000
4  II       0      NaN
4 III       2 4.666667

I went a little out of my way to make ARUF robust with any integer values of y. I can't seem to break it by generating y randomly with rbinom, and I believe it should handle any real number values of x, so it should work for any other vectors of the same kind that you might have.

0人赞添加讨论(0) 举报

Conditional calculating the numbers of values in c

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间