Change values in multiple columns of a dataframe u

2019-01-15 16:15发布

问题:

I am trying to change the value of a number of columns at once using a lookup table. They all use the same lookup table. I know how to do this for just one column -- I'd just use a merge, but am having trouble with multiple columns.

Below is an example dataframe and an example lookup table. My actual data is much larger (~10K columns with 8 rows).

example <- data.frame(a = seq(1,5), b = seq(5,1), c=c(1,4,3,2,5))

lookup <- data.frame(number = seq(1,5), letter = LETTERS[seq(1,5)])

Ideally, I would end up with a dataframe which looks like this:

example_of_ideal_output <- data.frame(a = LETTERS[seq(1,5)], b = LETTERS[seq(5,1)], c=LETTERS[c(1,4,3,2,5)])

Of course, in my actual data the dataframe is numbers, but the lookup table is a lot more complicated, so I can't just use a function like LETTERS to solve things.

Thank you in advance!

回答1:

Here's a solution that works on each column successively using lapply():

as.data.frame(lapply(example,function(col) lookup$letter[match(col,lookup$number)]));
##   a b c
## 1 A E A
## 2 B D D
## 3 C C C
## 4 D B B
## 5 E A E

Alternatively, if you don't mind switching over to a matrix, you can achieve a "more vectorized" solution, as a matrix will allow you to call match() and index lookup$letter just once for the entire input:

matrix(lookup$letter[match(as.matrix(example),lookup$number)],nrow(example));
##      [,1] [,2] [,3]
## [1,] "A"  "E"  "A"
## [2,] "B"  "D"  "D"
## [3,] "C"  "C"  "C"
## [4,] "D"  "B"  "B"
## [5,] "E"  "A"  "E"

(And of course you can coerce back to data.frame via as.data.frame() afterward, although you'll have to restore the column names as well if you want them, which can be done with setNames(...,names(example)). But if you really want to stick with a data.frame, my first solution is probably preferable.)



回答2:

Using dplyr

f <- function(x)setNames(lookup$letter, lookup$number)[x] 
library(dplyr)
example %>% 
  mutate_each(funs(f))
#  a b c
#1 A E A
#2 B D D
#3 C C C
#4 D B B
#5 E A E

Or with data.table

library(data.table)
setDT(example)[, lapply(.SD, f), ]
#   a b c
#1: A E A
#2: B D D
#3: C C C
#4: D B B
#5: E A E