Still relatively new to R. Trying to have dynamic variables in a loop but running into all sorts of problems. Initial code looks something like this (but bigger)
data.train$Pclass_F <- as.factor(data.train$Pclass)
data.test$Pclass_F <- as.factor(data.test$Pclass)
which I'm trying to build into a loop, imagining something like this
datalist <- c("data.train", "data.test")
for (i in datalist){
i$Pclass_F <- as.factor(i$Pclass)
}
which doesn't work. A little research implies that inorder to convert the string datalist
into a variable I need to use the get
function. So my next attempt was
datalist <- c("data.train", "data.test")
for (i in datalist){
get(i$Pclass_F) <- as.factor(get(i$Pclass))
}
which still doesn't work Error in i$Pclass : $ operator is invalid for atomic vectors
. Tried
datalist <- c("data.train", "data.test")
for (i in datalist){
get(i)$Pclass_F <- as.factor(get(i)$Pclass)
}
which still doesn't work Error in get(i)$Pclass_F <- as.factor(get(i)$Pclass) :
could not find function "get<-"
. Even tried
datalist <- c("data.train", "data.test")
for (i in datalist){
get(i[Pclass_F]) <- as.factor(get(i[Pclass]))
}
which still doesn't work Error in get(i[Pclass]) : object 'Pclass' not found
. The tried
datalist <- c("data.train", "data.test")
for (i in datalist){
get(i)[Pclass_F] <- as.factor(get(i)[Pclass])
}
which still doesn't work Error in '[.data.frame'(get(i), Pclass) : object 'Pclass' not found
Now realized I never included data so nobody can run this themselves, but just to show it's not a data problem
> class(data.train$Pclass)
[1] "integer"
> class(data.test$Pclass)
[1] "integer"
> datalist
[1] "data.train" "data.test"
The problem you have relates to the way data frames and most other objects are treated in R. In many programming languages, objects are (or at least can be) passed to functions by reference. In C++ if I pass a pointer to an object to a function which manipulates that object, the original is modified. This is not the way things work for the most part in R.
When an object is created like this:
And then copied like this:
Initially
y
andx
will point to the same object in RAM. But as soon as y is modified at all, a copy is created. So assigningy$c <- 12
has no effect onx
.get()
doesn't return the named object in a way that can be modified without first assigning it to another variable (which would mean the original variable is left unaltered).The correct way of doing this in R is storing your data frames in a named
list
. You can then loop through the list and use the replacement syntax to change the columns.You could also use:
This is using lapply to process each member of the list, returning a new list with the modified columns.
In theory, you could achieve what you were originally trying to do by using the
[[
operator on the global environment, but it would be an unconventional way of doing things and may lead to confusion later on.