Select unique values with 'select' functio

Is it possible to select all unique values from a column of a data.frame using select function in dplyr library? Something like "SELECT DISTINCT field1 FROM table1" in SQL notation.

Thanks!

标签： r select unique dplyr

3条回答

神经病院院长

2楼-- · 2020-05-12 12:23

Just to add to the other answers, if you would prefer to return a vector rather than a dataframe, you have the following options:

dplyr < 0.7.0

Enclose the dplyr functions in a parentheses and combine it with $ syntax:

(mtcars %>% distinct(cyl))$cyl

dplyr >= 0.7.0

Use the pull verb:

mtcars %>% distinct(cyl) %>% pull()

0人赞添加讨论(0) 举报

孤傲高冷的网名

3楼-- · 2020-05-12 12:25

In dplyr 0.3 this can be easily achieved using the distinct() method.

Here is an example:

distinct_df = df %>% distinct(field1)

You can get a vector of the distinct values with:

distinct_vector = distinct_df$field1

You can also select a subset of columns at the same time as you perform the distinct() call, which can be cleaner to look at if you examine the data frame using head/tail/glimpse.:

distinct_df = df %>% distinct(field1) %>% select(field1) distinct_vector = distinct_df$field1

0人赞添加讨论(0) 举报

劫难

4楼-- · 2020-05-12 12:38

The dplyr select function selects specific columns from a data frame. To return unique values in a particular column of data, you can use the group_by function. For example:

library(dplyr)

# Fake data
set.seed(5)
dat = data.frame(x=sample(1:10,100, replace=TRUE))

# Return the distinct values of x
dat %>%
  group_by(x) %>%
  summarise() 

    x
1   1
2   2
3   3
4   4
5   5
6   6
7   7
8   8
9   9
10 10

If you want to change the column name you can add the following:

dat %>%
  group_by(x) %>%
  summarise() %>%
  select(unique.x=x)

This both selects column x from among all the columns in the data frame that dplyr returns (and of course there's only one column in this case) and changes its name to unique.x.

You can also get the unique values directly in base R with unique(dat$x).

If you have multiple variables and want all unique combinations that appear in the data, you can generalize the above code as follows:

set.seed(5)
dat = data.frame(x=sample(1:10,100, replace=TRUE), 
                 y=sample(letters[1:5], 100, replace=TRUE))

dat %>% 
  group_by(x,y) %>%
  summarise() %>%
  select(unique.x=x, unique.y=y)

0人赞添加讨论(0) 举报

Select unique values with 'select' functio

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间