I have a large data.frame of character data that I want to convert based on what is commonly called a dictionary in other languages.
Currently I am going about it like so:
foo <- data.frame(snp1 = c("AA", "AG", "AA", "AA"), snp2 = c("AA", "AT", "AG", "AA"), snp3 = c(NA, "GG", "GG", "GC"), stringsAsFactors=FALSE)
foo <- replace(foo, foo == "AA", "0101")
foo <- replace(foo, foo == "AC", "0102")
foo <- replace(foo, foo == "AG", "0103")
This works fine, but it is obviously not pretty and seems silly to repeat the replace
statement each time I want to replace one item in the data.frame.
Is there a better way to do this since I have a dictionary of approximately 25 key/value pairs?
Here is a quick solution
Note this answer started as an attempt to solve the much simpler problem posted in How to replace all values in data frame with a vector of values?. Unfortunately, this question was closed as duplicate of the actual question. So, I'll try to suggest a solution based on replacing factor levels for both cases, here.
In case there is only a vector (or one data frame column) whose values need to be replaced and there are no objections to use factor we can coerce the vector to factor and change the factor levels as required:
Using the
forcats
package this can be done in a one-liner:In case all values of multiple columns of a data frame need to be replaced, the approach can be extended.
Note that
level_vec
andreplacement_vec
must have equal lengths.More importantly,
level_vec
should be complete , i.e., include all possible values in the affected columns of the original data frame. (Useunique(sort(unlist(foo)))
to verify). Otherwise, any missing values will be coerced to<NA>
. Note that this is also a requirement for Martin Morgans's answer.So, if there are only a few different values to be replaced you will be probably better off with one of the other answers, e.g., Ramnath's.