(Somewhat related question: Enter new column names as string in dplyr's rename function)
In the middle of a dplyr
chain (%>%
), I would like to replace multiple column names with functions of their old names (using tolower
or gsub
, etc.)
library(tidyr); library(dplyr)
data(iris)
# This is what I want to do, but I'd like to use dplyr syntax
names(iris) <- tolower( gsub("\\.", "_", names(iris) ) )
glimpse(iris, 60)
# Observations: 150
# Variables:
# $ sepal_length (dbl) 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6,...
# $ sepal_width (dbl) 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4,...
# $ petal_length (dbl) 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4,...
# $ petal_width (dbl) 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3,...
# $ species (fctr) setosa, setosa, setosa, setosa, s...
# the rest of the chain:
iris %>% gather(measurement, value, -species) %>%
group_by(species,measurement) %>%
summarise(avg_value = mean(value))
I see ?rename
takes the argument replace
as a named character vector, with new names as values, and old names as names.
So I tried:
iris %>% rename(replace=c(names(iris)=tolower( gsub("\\.", "_", names(iris) ) ) ))
but this (a) returns Error: unexpected '=' in iris %>% ...
and (b) requires referencing by name the data frame from the previous operation in the chain, which in my real use case I couldn't do.
iris %>%
rename(replace=c( )) %>% # ideally the fix would go here
gather(measurement, value, -species) %>%
group_by(species,measurement) %>%
summarise(avg_value = mean(value)) # I realize I could mutate down here
# instead, once the column names turn into values,
# but that's not the point
# ---- Desired output looks like: -------
# Source: local data frame [12 x 3]
# Groups: species
#
# species measurement avg_value
# 1 setosa sepal_length 5.006
# 2 setosa sepal_width 3.428
# 3 setosa petal_length 1.462
# 4 setosa petal_width 0.246
# 5 versicolor sepal_length 5.936
# 6 versicolor sepal_width 2.770
# ... etc ....
Here's a way around the somewhat awkward
rename
syntax:Both
select()
andselect_all()
can be used to rename columns.If you wanted to rename only specific columns you can use
select
:rename
does the same thing, just without having to includeeverything()
:select_all()
works on all columns and can take a function as an argument:or combining the two:
This is a very late answer, on May 2017
As of
dplyr 0.5.0.9004
, soon to be 0.6.0, many new ways of renaming columns, compliant with themaggritr
pipe operator%>%
, have been added to the package.Those functions are:
There are many different ways of using those functions, but the one relevant to your problem, using the
stringr
package is the following:And so, carry on with the plumbing :) (no pun intended).
My eloquent attempt using base, stringr and dplyr:
EDIT: library(tidyverse) now includes all three libraries.
I do this for building functions with piping.
For this particular [but fairly common] case, the function has already been written in the janitor package:
so all together,
I think you're looking at the documentation for
plyr::rename
, notdplyr::rename
. You would do something like this withdplyr::rename
: