I am trying to unselect columns in my dataset using dplyr, but I am not able to achieve that since last night.
I am well aware of work around but I am being strictly trying to find answer just through dplyr.
library(dplyr)
df <- tibble(x = c(1,2,3,4), y = c('a','b','c','d'))
df %>% select(-c('x'))
Gives me an error : Error in -c("x") : invalid argument to unary operator
Now, I know that select takes in unquoted values but I am not able to sub-select in this fashion.
Please note the above dataset is just an example, we can have many columns.
Thanks,
Prerit
Edit: OP's actual question was about how to use a character vector to select or deselect columns from a dataframe. Use the one_of()
helper function for that:
colnames(iris)
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
cols <- c("Petal.Length", "Sepal.Length")
select(iris, one_of(cols)) %>% colnames
# [1] "Petal.Length" "Sepal.Length"
select(iris, -one_of(cols)) %>% colnames
# [1] "Sepal.Width" "Petal.Width" "Species"
You should have a look at the select helpers (type ?select_helpers
) because they're incredibly useful. From the docs:
starts_with()
: starts with a prefix
ends_with()
: ends with a prefix
contains()
: contains a literal string
matches()
: matches a regular expression
num_range()
: a numerical range like x01, x02, x03.
one_of()
: variables in character vector.
everything()
: all variables.
Given a dataframe with columns names a:z, use select
like this:
select(-a, -b, -c, -d, -e)
# OR
select(-c(a, b, c, d, e))
# OR
select(-(a:e))
# OR if you want to keep b
select(-a, -(c:e))
# OR a different way to keep b, by just putting it back in
select(-(a:e), b)
So if I wanted to omit two of the columns from the iris
dataset, I could say:
colnames(iris)
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
select(iris, -c(Sepal.Length, Petal.Length)) %>% colnames()
# [1] "Sepal.Width" "Petal.Width" "Species"
But of course, the best and most concise way to achieve that is using one of select
's helper functions:
select(iris, -ends_with(".Length")) %>% colnames()
# [1] "Sepal.Width" "Petal.Width" "Species"
P.S. It's weird that you are passing quoted values to dplyr
, one of its big niceties is that you don't have to keep typing out quotes all the time. As you can see, bare values work fine with dplyr
and ggplot2
.