How to use sample and seq in a dplyr pipline?

2019-08-03 23:21发布

问题:

I have a dataframe with two columns, low and high. I would like to create a new variable that is a randomly selected value between low and high (inclusive and equal probability) using dplyr. I have tried

library(tidyverse)

data_frame(low = 1:10, high = 11) %>% 
    mutate(rand_btwn = base::sample(seq(low, high, by = 1), size = 1))

which gives me an error since seq expects scalar arguments.

I then tried again using a vectorized version of seq

seq2 <- Vectorize(seq.default, vectorize.args = c("from", "to"))

data_frame(low = 1:10, high = 11) %>% 
    mutate(rand_btwn = base::sample(seq2(low, high, by = 1), size = 1))

but this does not give me the desired result either.

回答1:

To avoid the rowwise() pattern, I usually prefer to map() in mutate(), like:

set.seed(123)
data_frame(low = 1:10, high = 11) %>% 
  mutate(rand_btwn = map_int(map2(low, high, seq), sample, size = 1))
# # A tibble: 10 x 3
#      low  high rand_btwn
#    <int> <dbl>     <int>
#  1     1    11         4
#  2     2    11         9
#  3     3    11         6
#  4     4    11        11
#  5     5    11        11
#  6     6    11         6
#  7     7    11         9
#  8     8    11        11
#  9     9    11        10
# 10    10    11        10

or:

set.seed(123)
data_frame(low = 1:10, high = 11) %>% 
  mutate(rand_btwn = map2_int(low, high, ~ sample(seq(.x, .y), 1)))

Your Vectorize() approach also works:

sample_v <- Vectorize(function(x, y) sample(seq(x, y), 1))

set.seed(123)
data_frame(low = 1:10, high = 11) %>% 
  mutate(rand_btwn = sample_v(low, high))