I want to rename some random columns of a large data frame and I want to use the current column names, not the indexes. Column indexes might change if I add or remove columns to the data, so I figure using the existing column names is a more stable solution.
This is what I have now:
mydf = merge(df.1, df.2)
colnames(mydf)[which(colnames(mydf) == "MyName.1")] = "MyNewName"
Can I simplify this code, either the original merge()
call or just the second line? "MyName.1"
is actually the result of an xts merge
of two different xts objects.
names(mydf)[names(mydf) == "MyName.1"] = "MyNewName" # 13 characters shorter.
Although, you may want to replace a vector eventually. In that case, use %in%
instead of ==
and set MyName.1 as a vector of equal length to MyNewName
The trouble with changing column names of a data.frame
is that, almost unbelievably, the entire data.frame
is copied. Even when it's in .GlobalEnv
and no other variable points to it.
The data.table
package has a setnames()
function which changes column names by reference without copying the whole dataset. data.table
is different in that it doesn't copy-on-write, which can be very important for large datasets. (You did say your data set was large.). Simply provide the old
and the new
names:
require(data.table)
setnames(DT,"MyName.1", "MyNewName")
# or more explicit:
setnames(DT, old = "MyName.1", new = "MyNewName")
?setnames
plyr
has a rename function for just this purpose:
library(plyr)
mydf <- rename(mydf, c("MyName.1" = "MyNewName"))
names(mydf) <- sub("MyName\\.1", "MyNewName", names(mydf))
This would generalize better to a multiple-name-change strategy if you put a stem as a pattern to be replaced using gsub
instead of sub
.
You can use the str_replace
function of the stringr package:
names(mydf) <- str_replace(names(mydf), "MyName.1", "MyNewName")