It seems not possible to get matrices of factor in R. Is it true? If yes, why? If not, how should I do?
f <- factor(sample(letters[1:5], 20, rep=TRUE), letters[1:5])
m <- matrix(f,4,5)
is.factor(m) # fail.
m <- factor(m,letters[1:5])
is.factor(m) # oh, yes?
is.matrix(m) # nope. fail.
dim(f) <- c(4,5) # aha?
is.factor(f) # yes..
is.matrix(f) # yes!
# but then I get a strange behavior
cbind(f,f) # is not a factor anymore
head(f,2) # doesn't give the first 2 rows but the first 2 elements of f
# should I worry about it?
In this case, it may walk like a duck and even quack like a duck, but
f
from:really isn't a matrix, even though
is.matrix()
claims that it strictly is one. To be a matrix as far asis.matrix()
is concerned,f
only needs to be a vector and have adim
attribute. By adding the attribute tof
you pass the test. As you have seen, however, once you start usingf
as a matrix, it quickly loses the features that make it a factor (you end up working with the levels or the dimensions get lost).There are really only matrices and arrays for the atomic vector types:
plus, as @hadley reminds me, you can also have list matrices and arrays (by setting the
dim
attribute on a list object. See, for example, the Matrices & Arrays section of Hadley's book, Advanced R.)Anything outside those types would be coerced to some lower type via
as.vector()
. This happens inmatrix(f, nrow = 3)
not becausef
is atomic according tois.atomic()
(which returnsTRUE
forf
because it is internally stored as an integer andtypeof(f)
returns"integer"
), but because it has aclass
attribute. This sets theOBJECT
bit on the internal representation off
and anything that has a class is supposed to be coerced to one of the atomic types viaas.vector()
:Adding dimensions via
dim<-()
is a quick way to create an array without duplicating the object, but this bypasses some of the checks and balances that R would do if you coercedf
to a matrix via the other methodsThis gets found out when you try to use basic functions that work on matrices or use method dispatch. Note that after assigning dimensions to
f
,f
still is of class"factor"
:which explains the
head()
behaviour; you are not getting thehead.matrix
behaviour becausef
is not a matrix, at least as far as the S3 mechanism is concerned:and the
head.default
method calls[
for which there is afactor
method, and hence the observed behaviour:The
cbind()
behaviour can be explained from the documented behaviour (from?cbind
, emphasis mine):Again, the fact that
f
is of class"factor"
is defeating you because the defaultcbind
method will get called and it will strip the levels information and return the internal integer codes as you observed.In many respects, you have to ignore or at least not fully trust what the
is.foo
functions tell you, because they are just using simple tests to say whether something is or is not afoo
object.is.matrix()
andis.atomic()
are clearly wrong when it comes tof
(with dimensions) from a particular point of view. They are also right in terms of their implementation or at least their behaviour can be understood from the implementation; I thinkis.atomic(f)
is not correct, but if by "if is of an atomic type" R Core mean "type" to be the thing returned bytypeof(f)
thenis.atomic()
is right. A more strict test isis.vector()
, whichf
fails:because it has attributes beyond a
names
attribute:As to how should you get a factor matrix, well you can't, at least if you want it to retain the factor information (the labels for the levels). One solution would be to use a character matrix, which would retain the labels:
and we store the levels of
f
for future use incase we lose some elements of the matrix along the way.Or work with the internal integer representation:
and you can always get back to the levels/labels again via:
Using a data frame would seem to be not ideal for this as each component of the data frame would be treated as a separate factor whereas you seem to want to treat the array as a single factor with one set of levels.
If you really wanted to do what you want, which is have a factor matrix, you would most likely need to create your own S3 class to do this, plus all the methods to go with it. For example, you might store the factor matrix as a character matrix but with class
"factorMatrix"
, where you stored the levels alongside the factor matrix as an extra attribute say. Then you would need to write[.factorMatrix
, which would grab the levels, then use the default[
method on the matrix, and then add the levels attribute back on again. You could writecbind
andhead
methods as well. The list of required method would grow quickly however, but a simple implementation may suit and if you make your objects have classc("factorMatrix", "matrix")
(i.e inherit from the"matrix"
class), you'll pick up all the properties/methods of the"matrix"
class (which will drop the levels and other attributes) so you can at least work with the objects and see where you need to add new methods to fill out the behaviour of the class.Unfortunately factor support is not completely universal in R, so many R functions default to treating factors as their internal storage type, which is
integer
:This is what happens with
matrix
,cbind
. They don't know how to handle factors, but they do know what to do with integers, so they treat your factor like an integer.head
is actually the opposite. It does know how to handle a factor, but it never bothers to check that your factor is also a matrix so just treats it like a normal dimensionless factor vector.Your best bet to operate as if you had factors with your matrix is to coerce it to character. Once you are done with your operations, you can restore it back to factor form. You could also do this with the integer form, but then you risk weird stuff (you could for example do matrix multiplication on an integer matrix, but that makes no sense for factors).
Note that if you add class "matrix" to your factor some (but not all) things start working:
Produces:
This doesn't fix
rbind
, etc.