Calculate rowwise maximum from columns that have c

2019-08-20 17:26发布

问题:

I have the following objects:

s1 = "1_1_1_1_1"
s2 = "2_1_1_1_1"
s3 = "3_1_1_1_1"

Please note that the value of s1, s2, s3 can change in another example.

I then have the follwoing data frame:

set.seed(666)
df = data.frame(draw = c(1,2,3,4,1,2,3,4,1,2,3,4),
                resp = c(1,1,1,1,2,2,2,2,3,3,3,3),
                "1_1_1_1_1" = runif(12),
                "2_1_1_1_1" = runif(12),
                "3_1_1_1_1" = runif(12)).

Please note that the column names of may data frame will change based on the values of s1,s2,s3.

I now want to achieve the following:

  1. I want to find out which of last three columns in df has the highest value and store it as a value in a new column (values are supposed to be either of 1,2 or 3, depending on if the highest value is the first, second or third of these variables).
  2. Now that I know which value is the highest per row, I want to group/summarize the result by the column resp and count how often my max value is 1, 2 or 3.

So the outcome from 1. should be:

draw    resp    1_1_1_1_1    2_1_1_1_1    3_1_1_1_1    max
1       1       0.774        0.095        0.806        3
2       1       0.197        0.142        0.266        3
...

And the outcome from 2. is supposed to be:

resp    first_max    second_max    third_max
1       1            1             2
2       2            1             1
3       1            2             1

My problem is that tidyverse's rowwise function is deprecated and that I don't know how I can dynamically address columns in a tidyverse pipe by column names which a re stored externally (here in s1, s2, s3). One last note: I might be overcomplicating things by trying to go by the column names, when, in fact, the positions of the columns that I'm interested in are always at column position 3:5.

回答1:

Here is one way to get what you want. For a sligthly different format, you can use count rather than table but this matches your expected output. Hope this helps!!

library(dplyr)

df %>%
  mutate(max_val = max.col(select(., starts_with("X")))) %>%
  select(resp, max_val) %>%
  table()

    max_val
resp 1 2 3
   1 1 1 2
   2 2 1 1
   3 1 2 1

Or, you could do this:

df %>%
  mutate(max_val = max.col(.[3:5])) %>%
  count(resp, max_val) %>%
  mutate(max_val = paste0("max_", max_val)) %>%
  spread(value = n, key = max_val)

   resp max_1 max_2 max_3
  <dbl> <int> <int> <int>
1     1     1     1     2
2     2     2     1     1
3     3     1     2     1


回答2:

calculate max using pmap(row-wise iteration)

max_cols <- pmap_dbl(unname(df),function(x,y,...){
    vals <- unlist(list(...))
    return(which(vals == max(vals)))
})


result <- df %>% add_column(max = max_cols)

> result
   draw resp X1_1_1_1_1 X2_1_1_1_1  X3_1_1_1_1 max
1     1    1  0.4551478 0.70061232 0.618439890   2
2     2    1  0.3667764 0.26670969 0.024742605   1
3     3    1  0.6806912 0.03233215 0.004014758   1
4     4    1  0.9117449 0.42926492 0.885247456   1
5     1    2  0.1886954 0.34189707 0.985054492   3
6     2    2  0.5569398 0.78043504 0.100714130   2
7     3    2  0.9791164 0.92823982 0.676584495   1
8     4    2  0.9174654 0.74627116 0.485582287   1
9     1    3  0.3681890 0.69622331 0.672346875   2
10    2    3  0.5510356 0.99651637 0.482430518   2
11    3    3  0.4283281 0.12832611 0.018095649   1
12    4    3  0.6168436 0.64381995 0.655178701   3

Reshape the data frame.

reshape2::dcast(result,resp~max,fun.aggregate = length,value.var = "max")
  resp 1 2 3
1    1 1 1 2
2    2 2 1 1
3    3 1 2 1


标签: r tidyverse