I have several arrays of hashes (let's say I have three) as follows:
a = [{ cost: 10, value: 20},
{ cost: 9, value: 20},
{ cost: 10, value: 22},
{ cost: 2, value: 10}
]
b = [{ cost: 4, value: 20},
{ cost: 9, value: 20},
{ cost: 15, value: 22},
{ cost: 12, value: 10}
]
c = [{ cost: 10, value: 21},
{ cost: 9, value: 20},
{ cost: 10, value: 22},
{ cost: 3, value: 10}
]
I need to find out which choice of hashes, one from each array, gives me the maximum sum of :value
keeping the sum of :cost
under a given value (let's say 30
).
The answer is probably easy for this case just looking at it, but this is just sample data and in reality I have many more arrays. Can someone help me or point me in the right direction?
Edit: I should also mention that I'd like to do with this arrays of objects. I used hashes as an example since it would be easier to describe, but I'm planning on using Objects. Also, if an Object gets used from one of the arrays, I'd like it identical options to not be used from the other arrays so each selected Object should be unique.
We can use dynamic programming to solve this "knapsack problem" efficiently for any number of arrays (of hashes). Solution times would be roughly proportional to the number of arrays and to the square of the average array size.
My first thought was to code this up, but then decided that would take too long and probably would not be the best way of explaining how the algorithm works. Instead, I decided to show how an optimal solution would be computed for the example given in the question. It should be evident how the algorithm could be implemented for an arbitrary number of arrays.
[Edit: I decided to code this after all. I've added that at the end tidE]
First consider (the "last") array
c
. Here's a screen shot of the relevant information from an Excel spreadsheet.Note that I've not included the hash at offset
0
,{ cost: 10, value: 21}
. That's because that hash is "domninated" by the one at offset2
,{ cost: 10, value: 22}
. The former could never be preferred because they both have the same cost and the latter has a higher value.The second table shows which hash is preferred for a given "remaining budget" (i.e., after hashes in
a
andb
have been selected). Notice that the possible remaining budget ranges from3
(the lowest cost among the hashes in arrayc
) to24
(the total budget,30
less the costs of choosing the hashes ina
andb
that have the smallest cost:30 - 2 - 4 = 24
).It probably would be best to implement this with an array of hashes:
Now let's move on to array
b
.Here I've only included two of the four hashes contained in array
b
, as the hash at offset1
,{ cost: 9, value: 20 }
is dominated by the hash at offset0
,{ cost: 4, value: 20 }
and the hash{ cost: 12, value: 10}
is dominated by{ cost: 4, value: 20}
.Here the "available budget" ranges from
7
to27
.7
provides the minimum required to select one hash inb
(at offset0
, with cost4
) and one inc
(at offset3
, with cost3
). The high end of the range,27
, is the maximum that could be remaining after a hash ina
is selected (30 - 3
).Suppose the remaining budget (for arrays
b
andc
) was18
. If we were to select the hash at offset0
, we would realize a value of20
from that hash, and have a remaining budget of18 - 4 => 14
for selecting a hash in arrayc
. From the table for arrayc
we see that we would choose the hash at offset2
in arrayc
, which has a value of22
. We therefore would have a (maximum) value for arraysb
andc
of20 + 22 => 42
if the available budget were18
and we selected the hash at offset0
in arrayb
.We must now consider instead selecting the hash at offset 2 in array
b
, which would yield a value of22
and leave a remaining budget of18 - 15 => 3
for selecting a hash in arrayc
. We see from the table for arrayc
that that would be the hash at offset 3, with value10
. We therefore would have a (maximum) value for arraysb
andc
of22 + 10 => 32
if the available budget were18
and we selected the hash at offset2
in arrayb
.It follows that if we were to have an available budget of
18
for arraysb
andc
, the optimum would be to select the hash at offset0
in arrayb
and at offset3
in arrayc
, for a total (maximum) value of42
. I've outlined that choice in the table.The same results apply for any remaining budget in the range
18 - 23
. Similar calculations would be performed for each of the other available budget ranges. You'll see that there's a tie when the available budget is24
.We can now move on to array
a
. (We're almost finished.)I've not include the hash at offset
0
because it is dominated by the hash at offset2
.The available budget for arrays
a
,b
andc
is30
, so we do not have to condition on that (i.e. for the first array). We must consider each of three possibilities:choose the hash at offset
1
to earn a value of20
, which leaves a remaining budget of30 - 9 => 21
for arraysb
andc
, which (from the table for arrayb
) yields an optimal value of42
for the last two arrays, for a total value of20 + 42 => 62
.choose the hash at offset
2
to earn a value of22
, which leaves a remaining budget of30 - 10 => 20
for arraysb
andc
, which (from the table for arrayb
) yields an optimal value of42
for the last two arrays, for a total value of22 + 42 => 64
.choose the hash at offset
3
to earn a value of10
, which leaves a remaining budget of30 - 2 => 28
for arraysb
andc
, which (from the table for arrayb
) yields an optimal value of44
for the last two arrays, for a total value of10 + 44 => 54
.We therefore conclude that a maximum value of
64
is achieved by selecting the hash at offset2
in arraya
, the hash at offset0
in arrayb
and the hash at offset2
in arrayc
.Code
.
.
.
Example
I changed the data structure of
all_costs_and_values
to an array of hashes, with keys being labels for the cost/value pairs (e.g.,'12'
refers to what formerly was the hash at row offset1
, column offset2
in the OP's array). I would expect the labels to be replaced with more meaningful strings.In calculating the optimum, the array of hashes
solution
is constructed:Here's a one-liner:
Broken up a bit:
It would be:
Edit by sawa Slightly rewritten for readability.