可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am having some trouble replacing values in a dataframe. I would like to replace values based on a separate table. Below is an example of what I am trying to do.
I have a table where every row is a customer and every column is an animal they purchased. Lets call this dataframe table
.
> table
# P1 P2 P3
# 1 cat lizard parrot
# 2 lizard parrot cat
# 3 parrot cat lizard
I also have a table that I will reference called lookUp
.
> lookUp
# pet class
# 1 cat mammal
# 2 lizard reptile
# 3 parrot bird
What I want to do is create a new table called new
with a function replaces all values in table
with the class
column in lookUp
. I tried this myself using an lapply
function, but I got the following warnings.
new <- as.data.frame(lapply(table, function(x) {
gsub('.*', lookUp[match(x, lookUp$pet) ,2], x)}), stringsAsFactors = FALSE)
Warning messages:
1: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
argument 'replacement' has length > 1 and only the first element will be used
2: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
argument 'replacement' has length > 1 and only the first element will be used
3: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
argument 'replacement' has length > 1 and only the first element will be used
Any ideas on how to make this work?
回答1:
You posted an approach in your question which was not bad. Here's a smiliar approach:
new <- df # create a copy of df
# using lapply, loop over columns and match values to the look up table. store in "new".
new[] <- lapply(df, function(x) look$class[match(x, look$pet)])
An alternative approach which will be faster is:
new <- df
new[] <- look$class[match(unlist(df), look$pet)]
Note that I use empty brackets ([]
) in both cases to keep the structure of new
as it was (a data.frame).
(I'm using df
instead of table
and look
instead of lookup
in my answer)
回答2:
Another options is a combination of tidyr
and dplyr
library(dplyr)
library(tidyr)
table %>%
gather(key = "pet") %>%
left_join(lookup, by = "pet") %>%
spread(key = pet, value = class)
回答3:
Anytime you have two separate data.frame
s and are trying to bring info from one to the other, the answer is to merge.
Everyone has their own favorite merge method in R. Mine is data.table
.
Also, since you want to do this to many columns, it'll be faster to melt
and dcast
-- rather than loop over columns, apply it once to a reshaped table, then reshape again.
library(data.table)
#the row names will be our ID variable for melting
setDT(table, keep.rownames = TRUE)
setDT(lookUp)
#now melt, merge, recast
dcast(melt(table, id.vars = "rn" #melting (reshape wide to long)
)[lookup, new_value := i.class, on = c(value = "pet")], #merging
rn ~ variable, value.var = "new_value") #reform back to original shape
# rn P1 P2 P3
# 1: 1 mammal reptile bird
# 2: 2 reptile bird mammal
# 3: 3 bird mammal reptile
In case you find the dcast
/melt
bit a bit intimidating, here's an approach that just loops over columns; dcast
/melt
is simply sidestepping the loop for this problem.
setDT(table) #don't need row names this time
setDT(lookUp)
sapply(names(table), #(or to whichever are the relevant columns)
function(cc) table[lookUp, (cc) := #merge, replace
#need to pass a _named_ vector to 'on', so use setNames
i.class, on = setNames("pet", cc)])
回答4:
Make a named vector, and loop through every column and match, see:
# make lookup vector with names
lookUp1 <- setNames(as.character(lookUp$class), lookUp$pet)
lookUp1
# cat lizard parrot
# "mammal" "reptile" "bird"
# match on names get values from lookup vector
res <- data.frame(lapply(df1, function(i) lookUp1[i]))
# reset rownames
rownames(res) <- NULL
# res
# P1 P2 P3
# 1 mammal reptile bird
# 2 reptile bird mammal
# 3 bird mammal reptile
data
df1 <- read.table(text = "
P1 P2 P3
1 cat lizard parrot
2 lizard parrot cat
3 parrot cat lizard", header = TRUE)
lookUp <- read.table(text = "
pet class
1 cat mammal
2 lizard reptile
3 parrot bird", header = TRUE)
回答5:
The answer above showing how to do this in dplyr doesn't answer the question, the table is filled with NAs. This worked, I would appreciate any comments showing a better way:
# Add a customer column so that we can put things back in the right order
table$customer = seq(nrow(table))
classTable <- table %>%
# put in long format, naming column filled with P1, P2, P3 "petCount"
gather(key="petCount", value="pet", -customer) %>%
# add a new column based on the pet's class in data frame "lookup"
left_join(lookup, by="pet") %>%
# since you wanted to replace the values in "table" with their
# "class", remove the pet column
select(-pet) %>%
# put data back into wide format
spread(key="petCount", value="class")
Note that it would likely be useful to keep the long table that contains the customer, the pet, the pet's species(?) and their class. This example simply adds an intermediary save to a variable:
table$customer = seq(nrow(table))
petClasses <- table %>%
gather(key="petCount", value="pet", -customer) %>%
left_join(lookup, by="pet")
custPetClasses <- petClasses %>%
select(-pet) %>%
spread(key="petCount", value="class")