Possible Duplicate:
identifying or coding unique factors using R
I'm having some trouble with R.
I have a data set similar to the following, but much longer.
A B Pulse
1 2 23
2 2 24
2 2 12
2 3 25
1 1 65
1 3 45
Basically, the first 2 columns are coded. A
has 1, 2 which represent 2 different weights.
B
has 1, 2, 3 which represent 3 different times.
As they are coded numerical values, R will treat them as numerical variables.
I need to use the factor function to convert these variables into factors.
Help?
Here's an example:
#Create a data frame
> d<- data.frame(a=1:3, b=2:4)
> d
a b
1 1 2
2 2 3
3 3 4
#currently, there are no levels in the `a` column, since it's numeric as you point out.
> levels(d$a)
NULL
#Convert that column to a factor
> d$a <- factor(d$a)
> d
a b
1 1 2
2 2 3
3 3 4
#Now it has levels.
> levels(d$a)
[1] "1" "2" "3"
You can also handle this when reading in your data. See the colClasses
and stringsAsFactors
parameters in e.g. readCSV()
.
Note that, computationally, factoring such columns won't help you much, and may actually slow down your program (albeit negligibly). Using a factor will require that all values are mapped to IDs behind the scenes, so any print of your data.frame requires a lookup on those levels -- an extra step which takes time.
Factors are great when storing strings which you don't want to store repeatedly, but would rather reference by their ID. Consider storing a more friendly name in such columns to fully benefit from factors.
Given the following sample
myData <- data.frame(A=rep(1:2, 3), B=rep(1:3, 2), Pulse=20:25)
then
myData$A <-as.factor(myData$A)
myData$B <-as.factor(myData$B)
or you could select your columns altogether and wrap it up nicely:
# select columns
cols <- c("A", "B")
myData[,cols] <- data.frame(apply(myData[cols], 2, as.factor))
levels(myData$A) <- c("long", "short")
levels(myData$B) <- c("1kg", "2kg", "3kg")
To obtain
> myData
A B Pulse
1 long 1kg 20
2 short 2kg 21
3 long 3kg 22
4 short 1kg 23
5 long 2kg 24
6 short 3kg 25