How to get the optimized choice from arrays

I have several arrays of hashes (let's say I have three) as follows:

a = [{ cost: 10, value: 20},
     { cost: 9, value: 20},
     { cost: 10, value: 22},
     { cost: 2, value: 10}
    ]

b = [{ cost: 4, value: 20},
     { cost: 9, value: 20},
     { cost: 15, value: 22},
     { cost: 12, value: 10}
    ]

c = [{ cost: 10, value: 21},
     { cost: 9, value: 20},
     { cost: 10, value: 22},
     { cost: 3, value: 10}
    ]

I need to find out which choice of hashes, one from each array, gives me the maximum sum of :value keeping the sum of :cost under a given value (let's say 30).

The answer is probably easy for this case just looking at it, but this is just sample data and in reality I have many more arrays. Can someone help me or point me in the right direction?

Edit: I should also mention that I'd like to do with this arrays of objects. I used hashes as an example since it would be easier to describe, but I'm planning on using Objects. Also, if an Object gets used from one of the arrays, I'd like it identical options to not be used from the other arrays so each selected Object should be unique.

回答1:

Here's a one-liner:

a.product(b,c).select{ |arr| arr.reduce(0) { |sum,h| sum + h[:cost] } < 30 }.max_by{ |arr| arr.reduce(0){ |sum,h| sum + h[:value]} }

Broken up a bit:

a.product(b,c)
 .select{ |arr| arr.reduce(0) { |sum,h| sum + h[:cost] } < 30 }
 .max_by{ |arr| arr.reduce(0) { |sum,h| sum + h[:value] } }

回答2:

It would be:

def find_max_option(max_cost, *arys)
  a = arys.shift
  a.product(*arys).map do |c| 
   [ c, c.inject({}) {|result, hash| 
       result.merge(hash) {|_,o,n| o + n }
     }
   ]
  end.select {|_,v| v[:cost] < max_cost}.max_by {|_,v| v[:value]}.first
end

find_max_option(30, a, b, c)   #=> [{:cost=>10, :value=>22}, {:cost=>4, :value=>20}, {:cost=>10, :value=>22}]

Edit by sawa Slightly rewritten for readability.

def find_max_option(max_cost, *a)
  a.first.product(*a.drop(1))
  .group_by{|hs| hs.inject({}){|acc, h| acc.merge(h){|_, v_acc, v_h| v_acc + v_h}}}
  .select{|k, _| k[:cost] < max_cost}
  .max_by{|k, _| k[:value]}
  .last.first
end

回答3:

We can use dynamic programming to solve this "knapsack problem" efficiently for any number of arrays (of hashes). Solution times would be roughly proportional to the number of arrays and to the square of the average array size.

My first thought was to code this up, but then decided that would take too long and probably would not be the best way of explaining how the algorithm works. Instead, I decided to show how an optimal solution would be computed for the example given in the question. It should be evident how the algorithm could be implemented for an arbitrary number of arrays.

[Edit: I decided to code this after all. I've added that at the end tidE]

First consider (the "last") array c. Here's a screen shot of the relevant information from an Excel spreadsheet.

Note that I've not included the hash at offset 0, { cost: 10, value: 21}. That's because that hash is "domninated" by the one at offset 2, { cost: 10, value: 22}. The former could never be preferred because they both have the same cost and the latter has a higher value.

The second table shows which hash is preferred for a given "remaining budget" (i.e., after hashes in a and b have been selected). Notice that the possible remaining budget ranges from 3 (the lowest cost among the hashes in array c) to 24 (the total budget, 30 less the costs of choosing the hashes in a and b that have the smallest cost: 30 - 2 - 4 = 24).

It probably would be best to implement this with an array of hashes:

[{budget: 3,  value: 10, hash_offset: 3 }, 
 {budget: 4,  value: 10, hash_offset: 3 },
 ...
 {budget: 8,  value: 10, hash_offset: 3 },
 {budget: 9,  value: 20, hash_offset: 1 },
 {budget: 10, value: 22, hash_offset: 2 },
 ...
 {budget: 10, value: 22, hash_offset: 2 }]

Now let's move on to array b.

Here I've only included two of the four hashes contained in array b, as the hash at offset 1, { cost: 9, value: 20 } is dominated by the hash at offset 0, { cost: 4, value: 20 } and the hash { cost: 12, value: 10} is dominated by { cost: 4, value: 20}.

Here the "available budget" ranges from 7 to 27. 7 provides the minimum required to select one hash in b (at offset 0, with cost 4) and one in c (at offset 3, with cost 3). The high end of the range, 27, is the maximum that could be remaining after a hash in a is selected (30 - 3).

Suppose the remaining budget (for arrays b and c) was 18. If we were to select the hash at offset 0, we would realize a value of 20 from that hash, and have a remaining budget of 18 - 4 => 14 for selecting a hash in array c. From the table for array c we see that we would choose the hash at offset 2 in array c, which has a value of 22. We therefore would have a (maximum) value for arrays b and c of 20 + 22 => 42 if the available budget were 18 and we selected the hash at offset 0 in array b.

We must now consider instead selecting the hash at offset 2 in array b, which would yield a value of 22 and leave a remaining budget of 18 - 15 => 3 for selecting a hash in array c. We see from the table for array c that that would be the hash at offset 3, with value 10. We therefore would have a (maximum) value for arrays b and c of 22 + 10 => 32 if the available budget were 18 and we selected the hash at offset 2 in array b.

It follows that if we were to have an available budget of 18 for arrays b and c, the optimum would be to select the hash at offset 0 in array b and at offset 3 in array c, for a total (maximum) value of 42. I've outlined that choice in the table.

The same results apply for any remaining budget in the range 18 - 23. Similar calculations would be performed for each of the other available budget ranges. You'll see that there's a tie when the available budget is 24.

We can now move on to array a. (We're almost finished.)

I've not include the hash at offset 0 because it is dominated by the hash at offset 2.

The available budget for arrays a, b and c is 30, so we do not have to condition on that (i.e. for the first array). We must consider each of three possibilities:

choose the hash at offset 1 to earn a value of 20, which leaves a remaining budget of 30 - 9 => 21 for arrays b and c, which (from the table for array b) yields an optimal value of 42 for the last two arrays, for a total value of 20 + 42 => 62.
choose the hash at offset 2 to earn a value of 22, which leaves a remaining budget of 30 - 10 => 20 for arrays b and c, which (from the table for array b) yields an optimal value of 42 for the last two arrays, for a total value of 22 + 42 => 64.
choose the hash at offset 3 to earn a value of 10, which leaves a remaining budget of 30 - 2 => 28 for arrays b and c, which (from the table for array b) yields an optimal value of 44 for the last two arrays, for a total value of 10 + 44 => 54.

We therefore conclude that a maximum value of 64 is achieved by selecting the hash at offset 2 in array a, the hash at offset 0 in array b and the hash at offset 2 in array c.

Code

def knapsack(all_costs_and_values, total_budget)
  costs_and_values = remove_dominated_choices(all_costs_and_values)
  budget = remaining_budget(costs_and_values, total_budget)
  solution = optimize(costs_and_values, budget)
  display_solution(costs_and_values, solution)
end

private

def remove_dominated_choices(a)    
  a.map do |f|
    # f.invert.invert ensures that no two keys have the same value
    g = f.invert.invert
    g.each_with_object({}) do |(name,v),h|
      (h[name] = v) unless g.any? { |_,p| 
        (p[:cost] <= v[:cost] && p[:value] >  v[:value]) ||
        (p[:cost] <  v[:cost] && p[:value] >= v[:value]) }
    end
  end
end

def remaining_budget(b, tot_budget)
  mc = min_cost_per_hash(b)
  b.map.with_index do |h,i|
    if i.zero?
      (tot_budget..tot_budget)
    else
      (mc[i..-1].reduce(:+)..tot_budget - mc[0..i-1].reduce(:+))
    end
  end
end

def min_cost_per_hash(arr)
  arr.map { |h| h.values.min_by { |h| h[:cost] }[:cost] }
end

def optimize(costs_and_values,remaining_budget)
  solution = Array.new(costs_and_values.size)
  i = costs_and_values.size-1
  g = costs_and_values[i].sort_by { |k,v| -v[:cost] }
  solution[i] = remaining_budget[i].each_with_object({}) do |rb,h|
    name, f = g.find { |_,v| v[:cost] <= rb }  
    h[rb] = { name: name, value_onward: f[:value] }
  end
  while i > 0  
    i -= 1
    g = costs_and_values[i].sort_by { |k,v| v[:cost] }
    min_to_leave = remaining_budget[i+1].first    
    solution[i] = remaining_budget[i].each_with_object({}) do |rb,h|
      best = - Float::INFINITY
      g.each do |name, v|
        leave_for_next = rb - v[:cost]
        break if leave_for_next < min_to_leave
        candidate = v[:value] + solution[i+1][leave_for_next][:value_onward] 
        if candidate > best
          best = candidate
          h[rb] = { name: name, value_onward: candidate }
        end
      end
    end
  end
  solution      
end

def display_solution(costs_and_values, solution)
  rv = solution.first.keys.first
  puts "Optimal value: #{ solution.first[rv][:value_onward] }\n"
  solution.each_with_index do |h,i|
    name =  h[rv][:name]
    puts "  Best choice for hash #{i}: #{name}"
    rv -= costs_and_values[i][name][:cost]
  end
end

Example

I changed the data structure of all_costs_and_values to an array of hashes, with keys being labels for the cost/value pairs (e.g., '12' refers to what formerly was the hash at row offset 1, column offset 2 in the OP's array). I would expect the labels to be replaced with more meaningful strings.

all_costs_and_values =
  [{ '00' => { cost: 10, value: 20 }, '01' => { cost:  8, value: 17 },
     '02' => { cost: 10, value: 20 }, '03' => { cost: 12, value: 24 } },
   { '10' => { cost:  6, value: 16 }, '11' => { cost:  4, value: 14},
     '12' => { cost:  6, value: 17 }, '13' => { cost:  5, value: 13 } },
   { '20' => { cost:  8, value: 16 }, '21' => { cost: 14, value: 30 },
     '22' => { cost: 16, value: 32 }, '23' => { cost: 14, value: 32 } },
   { '30' => { cost:  2, value:  4 }, '31' => { cost:  5, value:  9 },
     '32' => { cost: 10, value: 16 }, '33' => { cost:  6, value:  8 } }]

total_budget = 30

knapsack(all_costs_and_values, total_budget)
  #=> Optimal value: 70
  #     Best choice for hash 0: 01
  #     Best choice for hash 1: 12
  #     Best choice for hash 2: 23
  #     Best choice for hash 3: 30

In calculating the optimum, the array of hashes solution is constructed:

solution
  #=> [{  30=>{:name=>"01", :value_onward=>70}},
  #                   "01" => { cost:  8, value: 17 } leave 30-8 = 22
  #=> {14-15=>{:name=>"11", :value_onward=>34},
  #=>     16=>{:name=>"12", :value_onward=>37},
  #=>  17-18=>{:name=>"11", :value_onward=>39},
  #=>     19=>{:name=>"12", :value_onward=>42},
  #=>  20-21=>{:name=>"11", :value_onward=>50},
  #=>     22=>{:name=>"12", :value_onward=>53}},
  #                   "12" => { cost:  6, value: 17 } leave 22-6 = 16
  #=> {10-12=>{:name=>"20", :value_onward=>20},
  #=>  13-15=>{:name=>"20", :value_onward=>25},
  #=>  16-18=>{:name=>"23", :value_onward=>36},
  #                   "23" => { cost: 14, value: 32 } leave 16-14 = 2
  #=>   {2-4=>{:name=>"30", :value_onward=>4},
  #                   "30" => { cost:  2, value:  4 } leave 2-2 = 0
  #=>    5-9=>{:name=>"31", :value_onward=>9},
  #=>     10=>{:name=>"32", :value_onward=>16}}]