Is it possible to select all unique values from a column of a data.frame
using select
function in dplyr
library?
Something like "SELECT DISTINCT field1 FROM table1
" in SQL
notation.
Thanks!
Is it possible to select all unique values from a column of a data.frame
using select
function in dplyr
library?
Something like "SELECT DISTINCT field1 FROM table1
" in SQL
notation.
Thanks!
Just to add to the other answers, if you would prefer to return a vector rather than a dataframe, you have the following options:
dplyr < 0.7.0
Enclose the dplyr functions in a parentheses and combine it with
$
syntax:dplyr >= 0.7.0
Use the
pull
verb:In dplyr 0.3 this can be easily achieved using the
distinct()
method.Here is an example:
distinct_df = df %>% distinct(field1)
You can get a vector of the distinct values with:
distinct_vector = distinct_df$field1
You can also select a subset of columns at the same time as you perform the
distinct()
call, which can be cleaner to look at if you examine the data frame using head/tail/glimpse.:distinct_df = df %>% distinct(field1) %>% select(field1) distinct_vector = distinct_df$field1
The
dplyr
select
function selects specific columns from a data frame. To return unique values in a particular column of data, you can use thegroup_by
function. For example:If you want to change the column name you can add the following:
This both selects column
x
from among all the columns in the data frame thatdplyr
returns (and of course there's only one column in this case) and changes its name tounique.x
.You can also get the unique values directly in base
R
withunique(dat$x)
.If you have multiple variables and want all unique combinations that appear in the data, you can generalize the above code as follows: