Which is the best method between Genetic Algorithm

2020-06-23 05:19发布

Let's say I have problem like this:

  • Knapsack capacity = 20 million
  • Number of item = 500
  • Weight of each item is random number between 100 to 20 million
  • Profit of each item is random number between 1 to 10

So which is the best method for my problem? GA or Dynamic Programming?

Please give me a simple explanation, as I'm newbie in this..

4条回答
甜甜的少女心
2楼-- · 2020-06-23 05:55

There is no "best" method, each has it's pros and cons.
In this case the trade off is between optimality of the solution found - GA does not in any way guarantee to find the best solution.
And the time/space required to compute the solution - using dynamic programming will save you some redundant computations but you will still have to compute all possible combinations at least once to find the best solution (or maybe even any solution at all).

查看更多
姐就是有狂的资本
3楼-- · 2020-06-23 06:08

Dynamic Programming (DP):

  • Exact algorithm - Finds global optimal solution
  • Long running time
  • Uses a lot of memory
  • Very simple to implement

Genetic Algorithm (GA):

  • Estimation - Doesn't necessarily find the global optimal solution
  • Short running time
  • Memory usage depends on number of individuals but is generally managable
  • Quality of solution depends on choosing an efficient representation + letting it run long enough
  • Reasonably simple to implement, design decisions may be a little more complex, especially if you don't have significant experience with GAs

Hill climbing:

  • Estimation - Doesn't necessarily find the global optimal solution. More subject to halting at a local optimum than a GA, though there are ways to reduce the chance of this happening
  • Short running time
  • Very low memory usage
  • Very simple to implement

DP (or any exact algorithm for an NP-complete problem) is generally only a good idea for a reasonably small problem, or if finding the global optimal is the most important thing.

There are 2 approaches to DP: (there is a reasonably simple optimization where you only store 2 rows, my memory usage analysis goes on the assumption that this optimization is used)

  • Have a matrix of items x weight, with cell value being the maximum value

    Matrix size = 500 x 20 000 000

    Running time = O(500 * 20 000 000) = O(10 000 000 000)

    Memory = max 10 * 500 -> 5 000 -> short = 2 bytes -> 2 * 20 000 000 * 2 = 80 000 000 < 80 MB

    Explanation: A[i,j] below represents the best (highest) value obtainable from any subset of the elements 1 to i with weight less than or equal to j. The update rule below means - find the best value between either not including the current element (thus the weight and value stays the same) or including it (so lookup the optimal value for (the current weight minus the current item's weight) and add the value of the current item to it). Then just return the A[500, 20000000], which represents the highest value obtainable from any subset of all elements with a maximum weight of the knapsack size.

    Algorithm:

    A[0, 0..20000000] = 0
    for i = 1:500
    for x = 0:20000000
      A[i, x] = max(A[i-1, x], value(i) + A[i-1, x-weight(i)])
    // ignore value(i) + A[i-1, x-weight(i)] when weight(i) > x
    return A[500, 20000000]
    
  • Have a matrix of items x value, with cell value being the minimum weight

    Matrix size = 500 x 10*500

    Running time = O(500 * 10*500) = O(2 500 000)

    Memory = max 20 000 000 -> int = 4 bytes -> 2 * 500 * 4 = 4 000 < 4 KB

    Explanation: A[i,j] below represents the lowest weight obtainable from any subset of the elements 1 to i with value equal to j. The update rule below means - find the best value between either not including the current element (thus the weight and value stays the same) or including it (so lookup the optimal value for (the current value minus the current item's value) and add the weight of the current item to it). The value at any cell is the exact weight of a subset to produce that value, so we need to look through all the cells A[500, x], which represents minimum weight subsets of elements for any value x.

    Algorithm:

    A[0, 0] = 0
    A[0, 1..5000] = ∞
    for i = 1:500
    for x = 0:5000
      A[i, x] = min(A[i-1, x], weight(i) + A[i-1, x-value(i)])
    // interpret A[i-1, x-value(i)] as 0 when value(i) > x
    return largest x that A[500, x] <= 20000000
    

So, yeah, the complexities pretty much speak for themselves, you'll wait a few hours for the first way, but mere seconds for the second, and there's a similar difference in memory usage (though 80 MB is still near negligible) (note that this is FAR from a rule, every case needs to be analysed in its own right).

查看更多
Luminary・发光体
4楼-- · 2020-06-23 06:17

First you need to consider Dynamic programming as an exact algorithm which can guarantee that the answer is going to be an optimum answer. On the other hand GA is a heuristic algorithm which usually converge to a local optima. There is a dynamic programming solution for the knapsack which is pseudo linear (O(capacity*number_of_items)) if they are integers. In your case, you need about 1e10 operation and memory. Which are feasible for a single computer. So if finding the optimum answer is important for you and you have enough time (in the order of a couple of hours) you can use a dynamic programming approach. Otherwise, you may prefer to use a heuristic algorithm, such as GA.

查看更多
爷的心禁止访问
5楼-- · 2020-06-23 06:19

Dynamic programming can run in time O(numItems * knapsackCapacity) and O(knapsackCapacity) memory. This means that, for your specifications, you would have:

  • about 20.000.000 x 500 = 10.000.000.000 operations - would probably finish executing in a few hours, depending on your machine;
  • since the profit of one item is at most 10 and you can have at most 500 items, that means your maximum theoretical profit cannot exceed 10 x 500 = 5000. This can be represented on 2 bytes (a short). So you will also need 2 x 20.000.000 / 1024 / 1024 ~= 38 MB of memory to store your DP array. Again, not that much.

The others have already mentioned the advantages and disadvantages of each. GA will finish faster, but the DP solution should also finish in a few hours, and it will give you an exact answer.

Personally, I would choose DP if waiting a few hours for the solution is not a problem.

Note: here is the DP that runs with O(knapsackCapacity) memory:

dp[i] = maximum profit obtainable for weight i
for i = 1 to numItems do
  for j = knapsackCapacity down to items[i].weight do
    dp[j] = max(dp[j], dp[j - items[i].weight] + items[i].profit)
查看更多
登录 后发表回答