I have a large data.frame of character data that I want to convert based on what is commonly called a dictionary in other languages.
Currently I am going about it like so:
foo <- data.frame(snp1 = c("AA", "AG", "AA", "AA"), snp2 = c("AA", "AT", "AG", "AA"), snp3 = c(NA, "GG", "GG", "GC"), stringsAsFactors=FALSE)
foo <- replace(foo, foo == "AA", "0101")
foo <- replace(foo, foo == "AC", "0102")
foo <- replace(foo, foo == "AG", "0103")
This works fine, but it is obviously not pretty and seems silly to repeat the replace
statement each time I want to replace one item in the data.frame.
Is there a better way to do this since I have a dictionary of approximately 25 key/value pairs?
Using dplyr::recode:
If you're open to using packages,
plyr
is a very popular one and has this handy mapvalues() function that will do just what you're looking for:Note that it works for data types of all kinds, not just strings.
assuming that
map
covers all the cases infoo
. This would feel less like a 'hack' and be more efficient in both space and time iffoo
were a matrix (of character()), thenBoth matrix and data frame variants run afoul of R's 2^31-1 limit on vector size when there are millions of SNPs and thousands of samples.
Since it's been a few years since the last answer, and a new question came up tonight on this topic and a moderator closed it, I'll add it here. The poster has a large data frame containing 0, 1, and 2, and wants to change them to AA, AB, and BB respectively.
Use
plyr
:Create a function over the data frame using
revalue
to replace multiple terms:Here's something simple that will do the job:
lapply
will output a list in this case that we don't actually care about. You could assign the result to something if you like and then just discard it. I'm iterating over the indices here, but you could just as easily place the key/vals in a list themselves and iterate over them directly. Note the use of global assignment with<<-
.I tinkered with a way to do this with
mapply
but my first attempt didn't work, so I switched. I suspect a solution withmapply
is possible, though.Used @Ramnath's answer above, but made it read (what to be replaced and what to be replaced with) from a file and use gsub rather than replace.
hgword.txt contains the following tab separated