R dplyr get name of which.min rowwise

2019-05-29 17:57发布

问题:

I just answered this question with a data.table approach, and was working hard to provide a dplyr equivalent, but failed misserably.

There's a simple data frame with numerical values in its columns. I want to get the name of the column with the minimum value using dplyr.

Please notice that I'm aware of other methods to solve this problem: at the moment my only interest is in the dplyr approach

id <- c(1,2,3,4,5,6,7,8,9,10)
x1 <- c(2,4,5,3,6,4,3,6,7,7)
x2 <- c(0,1,2,6,7,6,0,8,2,2)
x3 <- c(5,3,4,5,8,3,4,2,5,6)

DF <- data.frame(id, x1,x2,x3)

I tried several variants of:

DF %>% select(2:4) %>% rowwise() %>% mutate(y = function(x) names(x)[which.min(x)])

DF %>% select(2:4) %>% rowwise() %>% mutate(y = apply(x1:x3, 1, which.min(x1:x3))

but haven't found a way to do this the dplyr-way. I'll appreciate any hint!

Expected output:

DF
##     id x1 x2 x3  y
##  1:  1  2  0  5 x2
##  2:  2  4  1  3 x2
##  3:  3  5  2  4 x2
##  4:  4  3  6  5 x1
##  5:  5  6  7  8 x1
##  6:  6  4  6  3 x3
##  7:  7  3  0  4 x2
##  8:  8  6  8  2 x3
##  9:  9  7  2  5 x2
## 10: 10  7  2  6 x2

回答1:

This seems kind of (well, actually very) clunky but...

DF %>% 
  mutate(y = apply(.[,2:4], 1, function(x) names(x)[which.min(x)]))
   id x1 x2 x3  y
1   1  2  0  5 x2
2   2  4  1  3 x2
3   3  5  2  4 x2
4   4  3  6  5 x1
5   5  6  7  8 x1
6   6  4  6  3 x3
7   7  3  0  4 x2
8   8  6  8  2 x3
9   9  7  2  5 x2
10 10  7  2  6 x2


回答2:

Here's another approach.. I didn't run any benchmark but my guess is that it would perform better than using apply or rowwise.

DF %>% 
  mutate(y = names(.)[max.col(.[2:4]*-1)+1L])
#    id x1 x2 x3  y
# 1   1  2  0  5 x2
# 2   2  4  1  3 x2
# 3   3  5  2  4 x2
# 4   4  3  6  5 x1
# 5   5  6  7  8 x1
# 6   6  4  6  3 x3
# 7   7  3  0  4 x2
# 8   8  6  8  2 x3
# 9   9  7  2  5 x2
# 10 10  7  2  6 x2

The *-1 is because we want the min instead of the max column and +1 because we only look at columns 2:4 but need the correct column names in return.



标签: r dplyr min