R variable names in loop, get, etc

2020-05-06 08:17发布

Still relatively new to R. Trying to have dynamic variables in a loop but running into all sorts of problems. Initial code looks something like this (but bigger)

data.train$Pclass_F <- as.factor(data.train$Pclass)
data.test$Pclass_F <- as.factor(data.test$Pclass)

which I'm trying to build into a loop, imagining something like this

datalist <- c("data.train", "data.test")
for (i in datalist){
  i$Pclass_F <- as.factor(i$Pclass)
}

which doesn't work. A little research implies that inorder to convert the string datalist into a variable I need to use the get function. So my next attempt was

datalist <- c("data.train", "data.test")
for (i in datalist){
  get(i$Pclass_F) <- as.factor(get(i$Pclass))
}

which still doesn't work Error in i$Pclass : $ operator is invalid for atomic vectors. Tried

datalist <- c("data.train", "data.test")
for (i in datalist){
  get(i)$Pclass_F <- as.factor(get(i)$Pclass)
}

which still doesn't work Error in get(i)$Pclass_F <- as.factor(get(i)$Pclass) : could not find function "get<-". Even tried

datalist <- c("data.train", "data.test")
for (i in datalist){
  get(i[Pclass_F]) <- as.factor(get(i[Pclass]))
}

which still doesn't work Error in get(i[Pclass]) : object 'Pclass' not found. The tried

datalist <- c("data.train", "data.test")
for (i in datalist){
  get(i)[Pclass_F] <- as.factor(get(i)[Pclass])
}

which still doesn't work Error in '[.data.frame'(get(i), Pclass) : object 'Pclass' not found

Now realized I never included data so nobody can run this themselves, but just to show it's not a data problem

> class(data.train$Pclass)
[1] "integer"
> class(data.test$Pclass)
[1] "integer"
> datalist
[1] "data.train" "data.test" 

标签: r loops for-loop
1条回答
叛逆
2楼-- · 2020-05-06 08:49

The problem you have relates to the way data frames and most other objects are treated in R. In many programming languages, objects are (or at least can be) passed to functions by reference. In C++ if I pass a pointer to an object to a function which manipulates that object, the original is modified. This is not the way things work for the most part in R.

When an object is created like this:

x <- list(a = 5, b = 9)

And then copied like this:

y <- x

Initially y and x will point to the same object in RAM. But as soon as y is modified at all, a copy is created. So assigning y$c <- 12 has no effect on x.

get() doesn't return the named object in a way that can be modified without first assigning it to another variable (which would mean the original variable is left unaltered).

The correct way of doing this in R is storing your data frames in a named list. You can then loop through the list and use the replacement syntax to change the columns.

datalist <- list(data.train = data.train, data.test = data.test)
for (df in names(datalist)){
  datalist[[df]]$Pclass_F <- as.factor(datalist[[df]]$Pclass_F)
}

You could also use:

datalist <- setNames(lapply(list(data.train, data.test), function(data) {
  data$Pclass_Fb <- as.factor(data$Pclass_Fb)
  data
}), c("data.train", "data.test"))

This is using lapply to process each member of the list, returning a new list with the modified columns.

In theory, you could achieve what you were originally trying to do by using the [[ operator on the global environment, but it would be an unconventional way of doing things and may lead to confusion later on.

查看更多
登录 后发表回答