I have a set of data in which I need to code values of certain variables (numeric) into 3 classes.
My data set is similar to this but has 60 more variables:
anim <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
wt <- c(181,179,180.5,201,201.5,245,246.4,189.3,301,354,369,205,199,394,231.3)
data <- data.frame(anim,wt)
> data
anim wt
1 1 181.0
2 2 179.0
3 3 180.5
4 4 201.0
5 5 201.5
6 6 245.0
7 7 246.4
8 8 189.3
9 9 301.0
10 10 354.0
11 11 369.0
12 12 205.0
13 13 199.0
14 14 394.0
15 15 231.3
I need to code values of the variable "wt" up into 3 classes: (wt >= 179 & wt < 200) = 1; (wt >= 200 & wt < 300) = 2; (wt > 300) = 3
which should give me this
> data2
anim wt SWT
1 1 181.0 1
2 2 179.0 1
3 3 180.5 1
4 4 201.0 2
5 5 201.5 2
6 6 245.0 2
7 7 246.4 2
8 8 189.3 1
9 9 301.0 3
10 10 354.0 3
11 11 369.0 3
12 12 205.0 2
13 13 199.0 1
14 14 394.0 3
15 15 231.3 2
The
cut
method as outlined by @Greg is probably what you want here. One thing to note is thatcut
returns a factor by default, which you can suppress by supplyinglabels = FALSE
to return the integer values:Alternatively, if your cutting does not lend itself to natural breaks, you can use
ifelse()
. You can "nest" the ifelse statements similar to Excel. I use "with" to cut down on the typing needed:Just to show an alternate (similar to recode in SPSS) method from package car:
I think Greg's answers cover "standard operating procedure", but I find many uses for the findInterval function as well. It naturally returns a number that identifies the interval in the second argument.
Just for completeness and info, the classInt package (on CRAN) is another handy way to classify numbers into classes.
You can try
cut
EDIT: fixed group - right = FALSE, got rid of split example.