How NOT to select columns using select() dplyr whe

2020-06-08 05:59发布

问题:

I am trying to unselect columns in my dataset using dplyr, but I am not able to achieve that since last night.

I am well aware of work around but I am being strictly trying to find answer just through dplyr.

library(dplyr)
df <- tibble(x = c(1,2,3,4), y = c('a','b','c','d'))
df %>% select(-c('x'))

Gives me an error : Error in -c("x") : invalid argument to unary operator

Now, I know that select takes in unquoted values but I am not able to sub-select in this fashion.

Please note the above dataset is just an example, we can have many columns.

Thanks,

Prerit

回答1:

Edit: OP's actual question was about how to use a character vector to select or deselect columns from a dataframe. Use the one_of() helper function for that:

colnames(iris)

# [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

cols <- c("Petal.Length", "Sepal.Length")

select(iris, one_of(cols)) %>% colnames

# [1] "Petal.Length" "Sepal.Length"

select(iris, -one_of(cols)) %>% colnames

# [1] "Sepal.Width" "Petal.Width" "Species"

You should have a look at the select helpers (type ?select_helpers) because they're incredibly useful. From the docs:

starts_with(): starts with a prefix

ends_with(): ends with a prefix

contains(): contains a literal string

matches(): matches a regular expression

num_range(): a numerical range like x01, x02, x03.

one_of(): variables in character vector.

everything(): all variables.


Given a dataframe with columns names a:z, use select like this:

select(-a, -b, -c, -d, -e)

# OR

select(-c(a, b, c, d, e))

# OR

select(-(a:e))

# OR if you want to keep b

select(-a, -(c:e))

# OR a different way to keep b, by just putting it back in

select(-(a:e), b)

So if I wanted to omit two of the columns from the iris dataset, I could say:

colnames(iris)

# [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

select(iris, -c(Sepal.Length, Petal.Length)) %>% colnames()

# [1] "Sepal.Width" "Petal.Width" "Species" 

But of course, the best and most concise way to achieve that is using one of select's helper functions:

select(iris, -ends_with(".Length")) %>% colnames()

# [1] "Sepal.Width" "Petal.Width" "Species"   

P.S. It's weird that you are passing quoted values to dplyr, one of its big niceties is that you don't have to keep typing out quotes all the time. As you can see, bare values work fine with dplyr and ggplot2.



标签: r dplyr