R - Replace a double loop by a function from the a

2019-07-23 03:32发布

问题:

I have these loops :

xall = data.frame()
for (k in 1:nrow(VectClasses))
{
for (i in 1:nrow(VectIndVar))
  {
   xall[i,k] = sum(VectClasses[k,] == VectIndVar[i,])
  }
}

The data:

VectClasses = Data Frame containing the characteristics of each classes

VectIndVar = Data Frame containing each record of the data base

The two for loops work and give an output I can work with, however, it takes too long, hence my need for the apply family

The output I am looking for is as this:

    V1 V2 V3 V4
 1  3  3  2  2
 2  2  2  1  1
 3  3  4  3  3
 4  3  4  3  3
 5  4  4  3  3
 6  3  2  3  3

I tried using :

xball = data.frame()
xball = sapply(xball, function (i,k){
 sum(VectClasses[k,] == VectIndVar[i,])})

xcall = data.frame()
xcall = lapply(xcall, function (i, k){sum(VectClasses[k,] == VectIndVar[i,]} )

but neither seems to be filling the dataframe

reproductible data (shortened):

VectIndVar <- data.frame(a=sample(letters[1:5], 100, rep=T), b=floor(runif(100)*25), 
 c = sample(c(1:5), 100, rep=T), 
 d=sample(c(1:2), 100, rep=T))

and :

> K1 = 4
VectClasses= VectIndVar [sample(1:nrow(VectIndVar ), K1, replace=FALSE), ]

Can you help me?

回答1:

I would use outer instead of *apply:

res <- outer( 
  1:nrow(VectIndVar), 
  1:nrow(VectClasses),
  Vectorize(function(i,k) sum(VectIndVar[i,-1]==VectClasses[k,-1]))
)

(Thanks to this Q&A for clarifying that Vectorize is needed.)

This gives

> head(res) # with set.seed(1) before creating the data
     [,1] [,2] [,3] [,4]
[1,]    1    1    2    1
[2,]    0    0    1    0
[3,]    0    0    0    0
[4,]    0    0    1    0
[5,]    1    0    0    1
[6,]    1    1    1    1

As for speed, I would suggest using matrices instead of data.frames:

cmat <- as.matrix(VectClasses[-1]); rownames(cmat)<-VectClasses$a
imat <- as.matrix(VectIndVar[-1]);  rownames(imat)<-VectIndVar$a


标签: r loops apply