Is it possible to assign weights to different features before formulating a DFM in R?
Consider this example in R
str="apple is better than banana"
mydfm=dfm(str, ignoredFeatures = stopwords("english"), verbose = FALSE)
DFM mydfm looks like:
docs apple better banana
text1 1 1 1
But, I want to assign weights(apple:5, banana:3) beforehand, so that DFM mydfm looks like:
docs apple better banana
text1 5 1 3
I don't think so, however you can easily do it afterwards:
This points to the need to add an option to the
weight
method for dfm-class, to make this easier and more importantly not to strip the class of dfm from the sparse matrix. The dfm also has a@weights
slot in the object that is designed to keep a record of how it was weighted, so this information could/should also be preserved.@lukeA's solution drops the dfm class twice (not his or your fault but mine!!), once in the
%*%
and again in the<-
. The first can be avoided by using column-wise recycling and a standard*
instead of the matrix multiplication%*%
, since I don't think a method has been written for dfm-class for%*%
(which is why it defaults to thesparseMatrix
method). The second cannot currently be avoided if you reassign sub-matrix elements, but can be avoided if you simply replace one dfm-class object object with another.To make the new dfm-class object in a way that preserves the class, this would work (and here I have made the problem slightly more complex by adding a second document and another feature):
One more note: I'd encourage the use dfm-class-specific methods for extracting things like the column names, e.g.
features(mydfm)
rather thancolnames(mydfm)
, even though these will probably remain equivalent.