Rename multiple dataframe columns, referenced by c

2019-01-31 18:06发布

问题:

I want to rename some random columns of a large data frame and I want to use the current column names, not the indexes. Column indexes might change if I add or remove columns to the data, so I figure using the existing column names is a more stable solution. This is what I have now:

mydf = merge(df.1, df.2)
colnames(mydf)[which(colnames(mydf) == "MyName.1")] = "MyNewName"

Can I simplify this code, either the original merge() call or just the second line? "MyName.1" is actually the result of an xts merge of two different xts objects.

回答1:

names(mydf)[names(mydf) == "MyName.1"] = "MyNewName" # 13 characters shorter. 

Although, you may want to replace a vector eventually. In that case, use %in% instead of == and set MyName.1 as a vector of equal length to MyNewName



回答2:

The trouble with changing column names of a data.frame is that, almost unbelievably, the entire data.frame is copied. Even when it's in .GlobalEnv and no other variable points to it.

The data.table package has a setnames() function which changes column names by reference without copying the whole dataset. data.table is different in that it doesn't copy-on-write, which can be very important for large datasets. (You did say your data set was large.). Simply provide the old and the new names:

require(data.table)
setnames(DT,"MyName.1", "MyNewName")
# or more explicit:
setnames(DT, old = "MyName.1", new = "MyNewName")
?setnames


回答3:

plyr has a rename function for just this purpose:

library(plyr)
mydf <- rename(mydf, c("MyName.1" = "MyNewName"))


回答4:

names(mydf) <- sub("MyName\\.1", "MyNewName", names(mydf))

This would generalize better to a multiple-name-change strategy if you put a stem as a pattern to be replaced using gsub instead of sub.



回答5:

You can use the str_replace function of the stringr package:

names(mydf) <- str_replace(names(mydf), "MyName.1", "MyNewName")