Add Columns to an empty data frame in R

2019-06-15 10:42发布

问题:

I have searched extensively but not found an answer to this question on Stack Overflow.

Lets say I have a data frame a.

I define:

a <- NULL
a <- as.data.frame(a)

If I wanted to add a column to this data frame as so:

a$col1 <- c(1,2,3)

I get the following error:

Error in `$<-.data.frame`(`*tmp*`, "a", value = c(1, 2, 3)) : 
    replacement has 3 rows, data has 0

Why is the row dimension fixed but the column is not?

How do I change the number of rows in a data frame?

If I do this (inputting the data into a list first and then converting to a df), it works fine:

a <- NULL
a$col1 <- c(1,2,3)
a <- as.data.frame(a)

回答1:

The row dimension is not fixed, but data.frames are stored as list of vectors that are constrained to have the same length. You cannot add col1 to a because col1 has three values (rows) and a has zero, thereby breaking the constraint. R does not by default auto-vivify values when you attempt to extend the dimension of a data.frame by adding a column that is longer than the data.frame. The reason that the second example works is that col1 is the only vector in the data.frame so the data.frame is initialized with three rows.

If you want to automatically have the data.frame expand, you can use the following function:

cbind.all <- function (...) 
{
    nm <- list(...)
    nm <- lapply(nm, as.matrix)
    n <- max(sapply(nm, nrow))
    do.call(cbind, lapply(nm, function(x) rbind(x, matrix(, n - 
        nrow(x), ncol(x)))))
}

This will fill missing values with NA. And you would use it like: cbind.all( df, a )



回答2:

You could also do something like this where I read in data from multiple files, grab the column I want, and store it in the dataframe. I check whether the dataframe has anything in it, and if it doesn't, create a new one rather than getting the error about mismatched number of rows:

readCounts = data.frame()

for(f in names(files)){
    d = read.table(files[f], header=T, as.is=T)
    d2 = round(data.frame(d$NumReads))
    colnames(d2) = f
    if(ncol(readCounts) == 0){
        readCounts = d2
        rownames(readCounts) = d$Name
    } else{
        readCounts = cbind(readCounts, d2)
    }
}