R - Getting Column of Dataframe from String [dupli

2019-03-07 03:05发布

问题:

This question already has an answer here:

  • Dynamically select data frame columns using $ and a vector of column names 8 answers

I am trying to create a function that allows the conversion of selected columns of a data frame to categorical data type (factor) before running a regression analysis.

Question is how do I slice a particular column from a data frame using a string (character).

Example:

  strColumnNames <- "Admit,Rank"
  strDelimiter <- ","
  strSplittedColumnNames <- strsplit(strColumnNames, strDelimiter)
  for( strColName in strSplittedColumnNames[[1]] ){
    dfData$as.name(strColName) <- factor(dfData$get(strColName))
  }

Tried:

dfData$as.name()
dfData$get(as.name())
dfData$get()

Error Msg: Error: attempt to apply non-function

Any help would be greatly appreciated! Thank you!!!

回答1:

You need to change

dfData$as.name(strColName) <- factor(dfData$get(strColName))

to

dfData[[strColName]] <- factor(dfData[[strColName]])

You may read ?"[[" for more.

In your case, column names are generated programmingly, [[]] is the only way to go. Maybe this example will be clear enough to illustrate the problem of $:

dat <- data.frame(x = 1:5, y = 2:6)
z <- "x"

dat$z
# [1] NULL

dat[[z]]
# [1] 1 2 3 4 5

Regarding the other answer

apply definitely does not work, because the function you apply is as.factor or factor. apply always works on a matrix (if you feed it a data frame, it will convert it into a matrix first) and returns a matrix, while you can't have factor data class in matrix. Consider this example:

x <- data.frame(x1 = letters[1:4], x2 = LETTERS[1:4], x3 = 1:4, stringsAsFactors = FALSE)
x[, 1:2] <- apply(x[, 1:2], 2, as.factor)

str(x)
#'data.frame':  4 obs. of  3 variables:
# $ x1: chr  "a" "b" "c" "d"
# $ x2: chr  "A" "B" "C" "D"
# $ x3: int  1 2 3 4

Note, you still have character variable rather than factor. As I said, we have to use lapply:

x[1:2] <- lapply(x[1:2], as.factor)

str(x)
#'data.frame':  4 obs. of  3 variables:
# $ x1: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
# $ x2: Factor w/ 4 levels "A","B","C","D": 1 2 3 4
# $ x3: int  1 2 3 4

Now we see the factor class in x1 and x2.

Using apply for a data frame is never a good idea. If you read the source code of apply:

    dl <- length(dim(X))
    if (is.object(X)) 
    X <- if (dl == 2L) 
        as.matrix(X)
    else as.array(X)

You see that a data frame (which has 2 dimension) will be coerced to matrix first. This is very slow. If your data frame columns have multiple different class, the resulting matrix will have only 1 class. Who knows what the result of such coercion would be.

Yet apply is written in R not C, with an ordinary for loop:

 for (i in 1L:d2) {
        tmp <- forceAndCall(1, FUN, newX[, i], ...)
        if (!is.null(tmp)) 
            ans[[i]] <- tmp

so it is no better than an explicit for loop you write yourself.



回答2:

I would use a different method.

Create a vector of column names you want to change to factors:

factorCols <- c("Admit", "Rank")

Then extract these columns by index:

myCols <- which(names(dfData) %in% factorCols)

Finally, use apply to change these columns to factors:

dfData[,myCols] <- lapply(dfData[,myCols],as.factor)