I have a vector of factors given by a sequence of numbers. These factors are also found in separate data seta, called test_set
and train_set
. What the following code does is find where the factor in the data sets matches in the vector of factors and puts a 1 in the place of the matrix. Multiplying this matrix compound_test
by test_set$Compound
should give you compare_comp
.
compare_comp <- rbind(dcm,cmp1)[,1]
compound_test <- matrix(0,nrow(test_set),length(compare_comp)) # test indicator matrix
compound_train <-matrix(0,nrow(train_set),length(compare_comp))
for (i in 1:length(compare_comp)){
compound_test[which(compare_comp[i]==test_set$Compound),i]=1
compound_train[which(compare_comp[i]==train_set$Compound),i]=1}
It does this for a train and test set, and compare_comp is the vector of factors.
Is there a function in R that lets me create the same thing without the need for a for loop? I have tried model.matrix(~Compound,data=test_set)
without much luck.
While you may not be able to completely avoid iteration since you are comparing each element of
compare_comp
vector to the full vector ofCompound
in eachtest_set
andtrain_set
, you can however use more compact assignment with apply family functions.Specifically,
sapply
returns a logical matrix of booleans (TRUE
,FALSE
) that we assign in corresponding position to initialized matrices whereTRUE
converts to 1 andFALSE
to 0.Alternatively, the rarely used and known
vapply
(similar tosapply
but must define the output type), returns an equivalent matrix but as numeric type.Testing confirms with random data (see demo below), both versions are identical to your looped version
Online Demo