Shuffling a vector - all possible outcomes of samp

2019-05-10 12:44发布

问题:

I have a vector with five items.

my_vec <- c("a","b","a","c","d")

If I want to re-arrange those values into a new vector (shuffle), I could use sample():

shuffled_vec <- sample(my_vec)

Easy - but the sample() function only gives me one possible shuffle. What if I want to know all possible shuffling combinations? The various "combn" functions don't seem to help, and expand.grid() gives me every possible combination with replacement, when I need it without replacement. What's the most efficient way to do this?

Note that in my vector, I have the value "a" twice - therefore, in the set of shuffled vectors returned, they all should each have "a" twice in the set.

回答1:

Looking at a previous question (R: generate all permutations of vector without duplicated elements), I can see that the gtools package has a function for this. I couldn't however get this to work directly on your vector as such:

permutations(n = 5, r = 5, v = my_vec)
#Error in permutations(n = 5, r = 5, v = my_vec) : 
#  too few different elements

You can adapt it however like so:

apply(permutations(n = 5, r = 5), 1, function(x) my_vec[x])

#     [,1] [,2] [,3] [,4] 
#[1,] "a"  "a"  "a"  "a" ...
#[2,] "b"  "b"  "b"  "b" ...
#[3,] "a"  "a"  "c"  "c" ... 
#[4,] "c"  "d"  "a"  "d" ...
#[5,] "d"  "c"  "d"  "a" ... 


回答2:

I think permn from the combinat package does what you want

library(combinat)
permn(my_vec)

A smaller example

> x
[1] "a" "a" "b"
> permn(x)
[[1]]
[1] "a" "a" "b"

[[2]]
[1] "a" "b" "a"

[[3]]
[1] "b" "a" "a"

[[4]]
[1] "b" "a" "a"

[[5]]
[1] "a" "b" "a"

[[6]]
[1] "a" "a" "b"

If the duplicates are a problem you could do something similar to this to get rid of duplicates

strsplit(unique(sapply(permn(my_vec), paste, collapse = ",")), ",")

Or probably a better approach to removing duplicates...

dat <- do.call(rbind, permn(my_vec))
dat[duplicated(dat),]


回答3:

Noting that your data is effectively 5 levels from 1-5, encoded as "a", "b", "a", "c", and "d", I went looking for ways to get the permutations of the numbers 1-5 and then remap those to the levels you use.

Let's start with the input data:

my_vec <- c("a","b","a","c","d") # the character
my_vec_ind <- seq(1,length(my_vec),1) # their identifier

To get the permutations, I applied the function given at Generating all distinct permutations of a list in R:

permutations <- function(n){
  if(n==1){
    return(matrix(1))
  } else {
    sp <- permutations(n-1)
    p <- nrow(sp)
    A <- matrix(nrow=n*p,ncol=n)
    for(i in 1:n){
      A[(i-1)*p+1:p,] <- cbind(i,sp+(sp>=i))
    }
    return(A)
  }
}

First, create a data.frame with the permutations:

tmp <- data.frame(permutations(length(my_vec)))

You now have a data frame tmp of 120 rows, where each row is a unique permutation of the numbers, 1-5:

>tmp
    X1 X2 X3 X4 X5
1    1  2  3  4  5
2    1  2  3  5  4
3    1  2  4  3  5
...
119  5  4  3  1  2
120  5  4  3  2  1

Now you need to remap them to the strings you had. You can remap them using a variation on the theme of gsub(), proposed here: R: replace characters using gsub, how to create a function?

gsub2 <- function(pattern, replacement, x, ...) {
  for(i in 1:length(pattern))
    x <- gsub(pattern[i], replacement[i], x, ...)
  x
}

gsub() won't work because you have more than one value in the replacement array.

You also need a function you can call using lapply() to use the gsub2() function on every element of your tmp data.frame.

remap <- function(x, 
              old,
              new){
  return(gsub2(pattern = old, 
              replacement = new, 
              fixed = TRUE,
              x = as.character(x)))
}

Almost there. We do the mapping like this:

shuffled_vec <- as.data.frame(lapply(tmp, 
                          remap,
                          old = as.character(my_vec_ind), 
                          new = my_vec))

which can be simplified to...

shuffled_vec <- as.data.frame(lapply(data.frame(permutations(length(my_vec))), 
                          remap,
                          old = as.character(my_vec_ind), 
                          new = my_vec))

.. should you feel the need.

That gives you your required answer:

> shuffled_vec
    X1 X2 X3 X4 X5
1    a  b  a  c  d
2    a  b  a  d  c
3    a  b  c  a  d
...
119  d  c  a  a  b
120  d  c  a  b  a