Get the most efficient combination of a large List

2020-07-06 05:23发布

问题:

I'm looking to maximize the number of stars given a certain budget and max limit on the combination.

Example question:

With a budget of 500 euro, visiting only the maximum allowed restaurants or less, dine and collect the most stars possible.

I'm looking to write an efficient algorithm, that could potentially process 1 million Restaurant instances for up to 10 max Restaurants.

Note, this is a cross post from a question I asked yesterday: Java: Get the most efficient combination of a large List of objects based on a field

The solution below will assign 15$ per star to the r8 Restaurant, which means that when generating the list, it puts that into the list first, and with the remaining 70$ it can only get 2 more stars giving a total of 4 stars. However, if it was smart enough to skip the r8 restaurant ( even though it's the best dollar per star ratio ) the r1 restaurant would actually be a better choice for the budget, as it's 100$ cost and 5 stars.

Can anyone help attempt the problem and beat the current solution?

import itertools

class Restaurant():
  def __init__(self, cost, stars):
    self.cost = cost
    self.stars = stars
    self.ratio = cost / stars

  def display(self):
    print("Cost: $" + str(self.cost))
    print("Stars: " + str(self.stars))
    print()

r1 = Restaurant(100, 5)
r2 = Restaurant(140, 3)
r3 = Restaurant(90, 4)
r4 = Restaurant(140, 3)
r5 = Restaurant(120, 4)
r6 = Restaurant(60, 1)
r7 = Restaurant(40, 1)
r8 = Restaurant(30, 2)
r9 = Restaurant(70, 2)
r10 = Restaurant(250, 5)

print()
print("***************")
print("** Unsorted: **")
print("***************")
print()

restaurants = [r1, r2, r3, r4, r5, r6, r7, r8, r9, r10]

for restaurant in restaurants:
  print(restaurant.ratio, restaurant.stars)

print()
print("***************")
print("**  Sorted:  **")
print("***************")
print()

sorted_restaurants = sorted(restaurants, key = lambda x: x.ratio, reverse = True)

for restaurant in sorted_restaurants:
  print(restaurant.ratio, restaurant.stars)

print()
print("*********************")
print("** Begin Rucksack: **")
print("*********************")
print()

max = 5
budget = 100

spent = 0
quantity = 0

rucksack = []

for i in itertools.count():

  if len(rucksack) >= max or i == len(sorted_restaurants):
    break

  sorted_restaurants[i].display()

  if sorted_restaurants[i].cost + spent <= budget:
    spent = spent + sorted_restaurants[i].cost
    rucksack.append(sorted_restaurants[i])
  
print("Total Cost: $" + str(sum([x.cost for x in rucksack])))
print("Total Stars: " + str(sum([x.stars for x in rucksack])))

print()
print("*****************")
print("** Final List: **")
print("*****************")
print()

for restaurant in rucksack:
  restaurant.display()

回答1:

Sounds like your problem is pretty much the same as the Knapsack problem: Maximize value given certain weight and volume constraints. Basically value = total stars, weight = price, rucksack limit = total budget. Now there's an additional constraint of total "items" (restaurant visits) but that doesn't change the gist.

As you may or may not know, the knapsack problem is NP hard, which means no algorithm with polynomial time scaling is known.

However, there may be efficient pseudopolynomial algorithms using dynamic programming, and of course there are efficient heuristics, such as the "greedy" heuristic you seem to have discovered. This heuristic involves starting to fill up with the highest "density" items (most stars per buck) first. As you have seen, this heuristic fails to find the true optimum in some cases.

The dynamic programming approach should be pretty good here. It's based on a recursion: Given a budget B and a number of remaining visits V, what is the best set of restaurants to visit out of a total set of restaurants R?

See here: https://en.wikipedia.org/wiki/Knapsack_problem#0/1_knapsack_problem

Basically we define an array m for "max stars", where m[i, b, v] is the maximum amount of stars we can get when we are allowed to visits restaurants up to (and including) restaurant number i, spending at most b, and visiting at most v restaurants (the limit).

Now, we bottom-up fill this array. For example, m[0, b, v] = 0 for all values of b and v because if we can't go to any restaurants, we can't get any stars.

Also, m[i, b, 0] = 0 for all values of i and b because if we used up all our visits, we can't get any more stars.

Next line isn't too hard either:

m[i, b, v] = m[i - 1, b, v] if p[i] > b where p[i] is the price of dining at restaurant i. What does this line say? Well, if restaurant i is more expensive than we have money left (b) then we can't go there. Which means the max amount of stars we can get is the same whether we include restaurants up to i or just up to i - 1.

Next line is a bit tricky:

m[i, b, v] = max(m[i-1, b, v]), m[i-1, b - p[i], v-1] + s[i]) if p[i] <= b

Phew. s[i] is the amount of stars you get from restaurant i btw.

What does this line say? It's the heart of the dynamic programming approach. When considering the max amount of stars we can get when looking at restaurants up to and including i, then in the resulting solution we either go there or we don't, and we "just" have to see which of these two paths leads to more stars:

If we don't go to restaurant i, then we keep the same amount of money and remaining visits. The max amount of stars we can get in this path is the same as if we didn't even look at restaurant i. That's the first part in the max.

But if we do go to restaurant i, then we're left with p[i] less money, one fewer visit, and s[i] more stars. That's the second part in the max.

Now the question is simple: which of the two is larger.

You can create this array and fill it with a relatively simple for loop (take inspiration from the wiki). This just gives you the amount of stars though, not the actual list of restaurants to visit. For that, add some extra bookkeeping to the calculation of w.


I hope that information is enough to set you off in the right direction.

Alternatively, you can write your problem in terms of binary variables and a quadratic objective function and solve it on the D-Wave quantum annelaer :-p Message me if you want to know more about that.



回答2:

Using the same idea as my answer here:

In a collection of n positive numbers that sum up to S, at least one of them will be less than S divided by n (S/n)

you could build the list starting from the potential "cheapest" restaurants.

The steps of the algorithm:

  • Find the 5 restaurants with cost < 500 / 10, each with different stars and the lowest cost for each star. eg r1, r2, r3, r4, r5
  • For each of the above values, find another 5 restaurants with cost < (500 - cost(x)) / 9 and different stars. Again select lowest cost for each star
  • do this until you reach 10 restaurants and you do not exceed your budget.
  • Rerun the 3 steps above for 1 - 9 restaurants limit.
  • Keep the solution that produces the most stars

Of course, you cannot reselect a restaurant.

I think the worst case, you will have to calculate 5x5x5... = 5^10 + 5^9 + ... + 5^2 + 5 (= about 12 million) solutions.

In javascript

function Restaurant(name, cost, stars) {
    this.name = name;
    this.cost = cost;
    this.stars = stars;
}

function RestaurantCollection() {
    var restaurants = [];
    var cost = 0;
    this.stars = 0;

    this.addRestaurant = function(restaurant) {
        restaurants.push(restaurant);
        cost += restaurant.cost;
        this.stars += restaurant.stars;
    };

    this.setRestaurants = function(clonedRestaurants, nCost, nStars) {
        restaurants = clonedRestaurants;
        cost = nCost;
        this.stars += nStars;
    };
    this.getAll = function() {
        return restaurants;
    };

    this.getCost = function() {
        return cost;
    };
    this.setCost = function(clonedCost) {
        cost = clonedCost;
    };

    this.findNext5Restaurants = function(restaurants, budget, totalGoal) {
        var existingRestaurants = this.getAll();
        var maxCost = (budget - cost) / (totalGoal - existingRestaurants.length);
        var cheapestRestaurantPerStarRating = [];
        for(var stars = 5; stars > 0; stars--) {
            var found = findCheapestRestaurant(restaurants, stars, maxCost, existingRestaurants);
            if(found) {
                cheapestRestaurantPerStarRating.push(found);
            }
        }
        return cheapestRestaurantPerStarRating;
    };

    this.clone = function() {
        var restaurantCollection = new RestaurantCollection();
        restaurantCollection.setRestaurants([...restaurants], this.getCost(), this.stars);
        return restaurantCollection;
    };
}

function findCheapestRestaurant(restaurants, stars, maxCost, excludeRestaurants) {
     var excludeRestaurantNames = excludeRestaurants.map(restaurant => restaurant.name);
     var found = restaurants.find(restaurant => restaurant.stars == stars && restaurant.cost <= maxCost && !excludeRestaurantNames.includes(restaurant.name));
     return found;
}

function calculateNextCollections(restaurants, collections, budget, totalGoal) {
    var newCollections = [];
    collections.forEach(collection => {
        var nextRestaurants = collection.findNext5Restaurants(restaurants, budget, totalGoal);
        nextRestaurants.forEach(restaurant => {
            var newCollection = collection.clone();
            newCollection.addRestaurant(restaurant);
            if(newCollection.getCost() <= budget) {
                 newCollections.push(newCollection);
            }
        });
    });
    return newCollections;
};

var restaurants = [];
restaurants.push(new Restaurant('r1', 100, 5));
restaurants.push(new Restaurant('r2',140, 3));
restaurants.push(new Restaurant('r3',90, 4));
restaurants.push(new Restaurant('r4',140, 3));
restaurants.push(new Restaurant('r5',120, 4));
restaurants.push(new Restaurant('r6',60, 1));
restaurants.push(new Restaurant('r7',40, 1));
restaurants.push(new Restaurant('r8',30, 2));
restaurants.push(new Restaurant('r9',70, 2));
restaurants.push(new Restaurant('r10',250, 5));

restaurants.sort((a, b) => a.cost - b.cost);
var max = 5;
var budget = 100;

var total = max;
var totalCollections = [];

for(var totalGoal = total; totalGoal > 0; totalGoal--) {
    var collections = [new RestaurantCollection()];

    for(var i = totalGoal; i > 0; i--) {
        collections = calculateNextCollections(restaurants, collections, budget, totalGoal);
    }
    totalCollections = totalCollections.concat(collections);
}

var totalCollections = totalCollections.map(collection => { 
      return {
          name: collection.getAll().map(restaurant => restaurant.name),
          stars: collection.stars,
          cost: collection.getCost()
      }
});

console.log("Solutions found:\n");
console.log(totalCollections);

totalCollections.sort((a, b) => b.stars - a.stars);
console.log("Best solution:\n");
console.log(totalCollections[0]);