I am having some trouble replacing values in a dataframe. I would like to replace values based on a separate table. Below is an example of what I am trying to do.
I have a table where every row is a customer and every column is an animal they purchased. Lets call this dataframe table
.
> table
# P1 P2 P3
# 1 cat lizard parrot
# 2 lizard parrot cat
# 3 parrot cat lizard
I also have a table that I will reference called lookUp
.
> lookUp
# pet class
# 1 cat mammal
# 2 lizard reptile
# 3 parrot bird
What I want to do is create a new table called new
with a function replaces all values in table
with the class
column in lookUp
. I tried this myself using an lapply
function, but I got the following warnings.
new <- as.data.frame(lapply(table, function(x) {
gsub('.*', lookUp[match(x, lookUp$pet) ,2], x)}), stringsAsFactors = FALSE)
Warning messages:
1: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
argument 'replacement' has length > 1 and only the first element will be used
2: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
argument 'replacement' has length > 1 and only the first element will be used
3: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
argument 'replacement' has length > 1 and only the first element will be used
Any ideas on how to make this work?
The answer above showing how to do this in dplyr doesn't answer the question, the table is filled with NAs. This worked, I would appreciate any comments showing a better way:
Note that it would likely be useful to keep the long table that contains the customer, the pet, the pet's species(?) and their class. This example simply adds an intermediary save to a variable:
Another options is a combination of
tidyr
anddplyr
Anytime you have two separate
data.frame
s and are trying to bring info from one to the other, the answer is to merge.Everyone has their own favorite merge method in R. Mine is
data.table
.Also, since you want to do this to many columns, it'll be faster to
melt
anddcast
-- rather than loop over columns, apply it once to a reshaped table, then reshape again.In case you find the
dcast
/melt
bit a bit intimidating, here's an approach that just loops over columns;dcast
/melt
is simply sidestepping the loop for this problem.You posted an approach in your question which was not bad. Here's a smiliar approach:
An alternative approach which will be faster is:
Note that I use empty brackets (
[]
) in both cases to keep the structure ofnew
as it was (a data.frame).(I'm using
df
instead oftable
andlook
instead oflookup
in my answer)Make a named vector, and loop through every column and match, see:
data