I'm using melt
and encounter the following warning message:
attributes are not identical across measure variables; they will be dropped
After looking around people have mentioned it is because the variables are different classes; however, that is not the case with my dataset.
Here is the dataset:
test <- structure(list(park = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("miss", "piro", "sacn", "slbe"), class = "factor"),
a1.one = structure(c(3L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 3L,
3L), .Label = c("agriculture", "beaver", "development", "flooding",
"forest_pathogen", "harvest_00_20", "harvest_30_60", "harvest_70_90",
"none"), class = "factor"), a2.one = structure(c(6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("development",
"forest_pathogen", "harvest_00_20", "harvest_30_60", "harvest_70_90",
"none"), class = "factor"), a3.one = structure(c(3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("forest_pathogen",
"harvest_00_20", "none"), class = "factor"), a1.two = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("agriculture",
"beaver", "development", "flooding", "forest_pathogen", "harvest_00_20",
"harvest_30_60", "harvest_70_90", "none"), class = "factor"),
a2.two = structure(c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L), .Label = c("development", "forest_pathogen", "harvest_00_20",
"harvest_30_60", "harvest_70_90", "none"), class = "factor"),
a3.two = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), .Label = c("forest_pathogen", "harvest_00_20", "none"
), class = "factor")), .Names = c("park", "a1.one", "a2.one",
"a3.one", "a1.two", "a2.two", "a3.two"), row.names = c(NA, 10L
), class = "data.frame")
And here is the structure:
str(test)
'data.frame': 10 obs. of 7 variables:
$ park : Factor w/ 4 levels "miss","piro",..: 1 1 1 1 1 1 1 1 1 1
$ a1.one: Factor w/ 9 levels "agriculture",..: 3 1 3 3 3 3 1 3 3 3
$ a2.one: Factor w/ 6 levels "development",..: 6 6 6 6 6 6 6 6 6 6
$ a3.one: Factor w/ 3 levels "forest_pathogen",..: 3 3 3 3 3 3 3 3 3 3
$ a1.two: Factor w/ 9 levels "agriculture",..: 3 3 3 3 3 3 3 3 3 3
$ a2.two: Factor w/ 6 levels "development",..: 6 6 6 6 6 6 6 6 6 6
$ a3.two: Factor w/ 3 levels "forest_pathogen",..: 3 3 3 3 3 3 3 3 3 3
Is it because the number of levels are different for each variable? So, can I just ignore the warning message in this case?
To generate the warning message:
library(reshape2)
test.m <- melt (test,id.vars=c('park'))
Warning message:
attributes are not identical across measure variables; they will be dropped
Thanks.
BrodieG's answer is excellent; however there are some cases where it is impractical to refactor columns (for example GHCN climate data with 128 fixed-width columns that I wanted to melt into a much smaller number of columns).
In that case, the simplest solution is to treat the data as characters rather than factors: for example, you can re-import the data using
read.fwf(filename,stringsAsFactors=FALSE)
(the same idea would work forread.csv
). For a smaller number of columns you could convert factors to strings usingd$mystring<-as.character(d$myfactor)
.An explanation:
When you melt, you are combining multiple columns into one. In this case, you are combining factor columns, each of which has a
levels
attribute. These levels are not the same across columns because your factors are actually different.melt
just coerces each factor to character and drops their attributes when creating thevalue
column in the result.In this case the warning doesn't matter, but you need to be very careful when combining columns that are not of the same "type", where "type" does not mean just vector type, but generically the nature of things it refers to. For example, I would not want to melt a column containing speeds in MPH with one containing weights in LBs.
One way to confirm that it is okay to combine your factor columns is to ask yourself whether any possible value in one column would be a reasonable value to have in every other column. If that is the case, then likely the correct thing to do would be to ensure that every factor column has all the possible levels that it could accept (in the same order). If you do this, you will not get a warning when you melt the table.
An illustration:
The levels for
x
andy
are not the same:Here we
melt
and look at the columnx
andy
were molten into (value
):We get a character vector and a warning:
If however we reset the factors to have the same levels and only then melt:
We get the correct factor and no warnings:
The default behavior of
melt
is to drop factor levels even when they are identical, which is why we usefactorsAsStrings=F
above. If you had not used that setting you would have gotten a character vector, but no warning. I would argue the default behavior should be to keep the result as a factor, but that is not the case here.