I have one variable I am trying to pare down to a more manageable length of values. I exported a list of the variable's unique values into a csv file, and assigned them more general names in an adjacent column. E.g.,
EVTYPE new_category
- x1 x
- x2 x
- x3 x
- x4 x
- y1 y
- y2 y
- y3 y
I then uploaded this back into R, and am trying to create a new variable, where if old_val = x1, new_var2 =x , and so on. There are about 1,000 unique values in the old_val variable, so nesting ifelse statements or something similar isnt really possible. Here is some code I am working on, but cannot get to work yet, where dataset = the overall dataset and new_data = the dataset with the unique values: (Sorry for the poor formatting, not sure how to do that correctly for the above list)
ND_row_count <- NROW(new_data)
for (i in 1:ND_row_count){
if (dataset$EVTYPE==new_data$EVTYPE2[i]) {
dataset$new_category <- new_data$new_category[i]
}
}
You can use the vectorised function,
match
, for this.The following should return (and assign to
dataset$new_category
) a vector of new categories corresponding to your long vector of original values.Above,
match
finds, for each element ofdataset$EVTYPE
the position of the matching element ofnew_data$EVTYPE2
. We then use that vector of indices to subsetnew_data$new_category
.