I have a vector with five items.
my_vec <- c("a","b","a","c","d")
If I want to re-arrange those values into a new vector (shuffle), I could use sample():
shuffled_vec <- sample(my_vec)
Easy - but the sample() function only gives me one possible shuffle. What if I want to know all possible shuffling combinations? The various "combn" functions don't seem to help, and expand.grid() gives me every possible combination with replacement, when I need it without replacement. What's the most efficient way to do this?
Note that in my vector, I have the value "a" twice - therefore, in the set of shuffled vectors returned, they all should each have "a" twice in the set.
Looking at a previous question (R: generate all permutations of vector without duplicated elements), I can see that the gtools
package has a function for this. I couldn't however get this to work directly on your vector as such:
permutations(n = 5, r = 5, v = my_vec)
#Error in permutations(n = 5, r = 5, v = my_vec) :
# too few different elements
You can adapt it however like so:
apply(permutations(n = 5, r = 5), 1, function(x) my_vec[x])
# [,1] [,2] [,3] [,4]
#[1,] "a" "a" "a" "a" ...
#[2,] "b" "b" "b" "b" ...
#[3,] "a" "a" "c" "c" ...
#[4,] "c" "d" "a" "d" ...
#[5,] "d" "c" "d" "a" ...
I think permn
from the combinat package does what you want
library(combinat)
permn(my_vec)
A smaller example
> x
[1] "a" "a" "b"
> permn(x)
[[1]]
[1] "a" "a" "b"
[[2]]
[1] "a" "b" "a"
[[3]]
[1] "b" "a" "a"
[[4]]
[1] "b" "a" "a"
[[5]]
[1] "a" "b" "a"
[[6]]
[1] "a" "a" "b"
If the duplicates are a problem you could do something similar to this to get rid of duplicates
strsplit(unique(sapply(permn(my_vec), paste, collapse = ",")), ",")
Or probably a better approach to removing duplicates...
dat <- do.call(rbind, permn(my_vec))
dat[duplicated(dat),]
Noting that your data is effectively 5 levels from 1-5, encoded as "a", "b", "a", "c", and "d", I went looking for ways to get the permutations of the numbers 1-5 and then remap those to the levels you use.
Let's start with the input data:
my_vec <- c("a","b","a","c","d") # the character
my_vec_ind <- seq(1,length(my_vec),1) # their identifier
To get the permutations, I applied the function given at Generating all distinct permutations of a list in R:
permutations <- function(n){
if(n==1){
return(matrix(1))
} else {
sp <- permutations(n-1)
p <- nrow(sp)
A <- matrix(nrow=n*p,ncol=n)
for(i in 1:n){
A[(i-1)*p+1:p,] <- cbind(i,sp+(sp>=i))
}
return(A)
}
}
First, create a data.frame with the permutations:
tmp <- data.frame(permutations(length(my_vec)))
You now have a data frame tmp
of 120 rows, where each row is a unique permutation of the numbers, 1-5:
>tmp
X1 X2 X3 X4 X5
1 1 2 3 4 5
2 1 2 3 5 4
3 1 2 4 3 5
...
119 5 4 3 1 2
120 5 4 3 2 1
Now you need to remap them to the strings you had. You can remap them using a variation on the theme of gsub()
, proposed here: R: replace characters using gsub, how to create a function?
gsub2 <- function(pattern, replacement, x, ...) {
for(i in 1:length(pattern))
x <- gsub(pattern[i], replacement[i], x, ...)
x
}
gsub()
won't work because you have more than one value in the replacement array.
You also need a function you can call using lapply()
to use the gsub2()
function on every element of your tmp
data.frame.
remap <- function(x,
old,
new){
return(gsub2(pattern = old,
replacement = new,
fixed = TRUE,
x = as.character(x)))
}
Almost there. We do the mapping like this:
shuffled_vec <- as.data.frame(lapply(tmp,
remap,
old = as.character(my_vec_ind),
new = my_vec))
which can be simplified to...
shuffled_vec <- as.data.frame(lapply(data.frame(permutations(length(my_vec))),
remap,
old = as.character(my_vec_ind),
new = my_vec))
.. should you feel the need.
That gives you your required answer:
> shuffled_vec
X1 X2 X3 X4 X5
1 a b a c d
2 a b a d c
3 a b c a d
...
119 d c a a b
120 d c a b a