I have a variable in a dataframe where one of the fields typically has 7-8 values. I want to collpase them 3 or 4 new categories within a new variable within the dataframe. What is the best approach?
I would use a CASE statement if I were in a SQL-like tool but not sure how to attack this in R.
Any help you can provide will be much appreciated!
i dont like any of these, they are not clear to the reader or the potential user. I just use an anonymous function, the syntax is not as slick as a case statement, but the evaluation is similar to a case statement and not that painful. this also assumes your evaluating it within where your variables are defined.
all of those () are necessary to enclose and evaluate the anonymous function.
You can use the
base
functionmerge
for case-style remapping tasks:If you got
factor
then you could change levels by standard method:You could write simple function as a wrapper:
Here's a way using the
switch
statement:The one downside of this is that you have to keep writing the category name (
animal
, etc) for each item. It is syntactically more convenient to be able to define our categories as below (see the very similar question How do add a column in a data frame in R )and we want to somehow "invert" this mapping. I write my own invMap function:
and then invert the above map as follows:
And then it's easy to use this to add the
type
column in the data-frame:If you want to have sql-like syntax you can just make use of
sqldf
package. Tthe function to be used is also namessqldf
and the syntax is as followsMixing
plyr::mutate
anddplyr::case_when
works for me and is readable.Bonus points if the column can come out of mutate as a factor instead of char! The last line of the case_when statement, which catches all un-matched rows is very important.