R: from a vector, list all subsets of elements so

2020-04-23 07:24发布

问题:

Sorry in advance if the answer (1) is trivial; or (2) out there but I haven't been able to solve this issue or find and answer online. Any pointers will be much appreciated!

I am in need of a piece of code that can run through a vector and return all possible subsets of elements whose cumulative sum passes a threshold value.

Note that I do not want only the subsets that give me exactly the threshold. The cumulative sum can be above the threshold, as long as the algorithm stops adding an extra element if the value has been achieved already.

# A tiny example of the kind of input data. 
# However, note that efficiency is an issue 
# (I need to replicate the example in a large dataset)
v <- seq(1, 3) # My vector
threshold <- 3 # My threshold value

# I would like to get a list with the combinations
# 1 2 
# 1 3
# 2 3
# 3 

This piece of code works but is the clunkiest solution on earth...

for (i in 1: length(v)){
  thisvalue <- v[i]
  if (thisvalue >=threshold) { 
    cat (v[i], "\n",sep="\t") 
  } else {
    for (j in (i+1): length(v)){
      thisvalue <- v[i]+v[j]
      if (thisvalue >=threshold) { 
        cat (c(v[i], v[j]), "\n",sep="\t")
      } else {
        for (k in (i+2): length(v)){
          thisvalue <- v[i]+v[j]+v[k]
          if (thisvalue >=threshold) { 
            cat(c(v[i],v[j],v[k]),"\n",sep="\t")
        }
        }
      }
    }
  }
}

回答1:

This may be an option:

library(utils)
v <- seq (1,5)
v.len <- length(v)
threshold <- 3
for (count in seq(1,v.len))
{
  print(paste("subset length",count))
  combinations <- combn(v,count)
  out <- combinations[,apply(combinations, 2, sum)>=threshold]
  print (out)
}

above produces following output:

[1] "subset length 1"
[1] 3 4 5
[1] "subset length 2"
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    1    1    1    2    2    2    3    3     4
[2,]    2    3    4    5    3    4    5    4    5     5
[1] "subset length 3"
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    1    1    1    1    1    2    2    2     3
[2,]    2    2    2    3    3    4    3    3    4     4
[3,]    3    4    5    4    5    5    4    5    5     5
[1] "subset length 4"
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    1    1    2
[2,]    2    2    2    3    3
[3,]    3    3    4    4    4
[4,]    4    5    5    5    5
[1] "subset length 5"
[1] 1 2 3 4 5

so you'd need to do something with the output / decide where to store it etc.



回答2:

I found a solution within my limited coding skills that might be inefficient but it is feasible and neater than writing endless loops.

The function was inspired by the java code found at Find all subsets of a set that sum up to n

recursive.subset <-function(x, index, current, threshold, result){
  for (i in index:length(x)){
    if (current + x[i] >= threshold){
      cat (result, x[i], "\n",sep="\t") 
    } else {
      recursive.subset(x, i + 1, current+x[i], threshold, c(result,x[i]))
    }
  }
}

To call the function, just

inivector <- vector(mode="numeric", length=0) #initializing empty vector
recursive.subset (v, 1, 0, threshold, inivector)

So I get

1 2
1 3
2 3
3