I am relatively new in optimizationa nd I am trying to optimize a problem (from a pas class in Coursera, 2 years ago) about Warehouse Location. The problem is, its been more than 6 hours and its still running on an instance with 100 warehouses and 1000 customers.
The problem is the following. I have a set of warehouses that I can either open or not. Opening each of them has a cost s_w. Also, they all have a maximum capcity, cap_w. On the other side, there is a bunch of clients, all of them have to be connected to one (and only one) open warehouse. Each of them has a demand d_c and for each of the clients, there is a transportation cost from each warehouse t_wc. What I want, is obviouly, to minimize the total cost.
So, I have an array of size equal to the total number of warehouses called x. Each x[w] is a integer {0,1} defining if warehouse w is open or not. I also have a matrix of 0s and 1s defining which warehouse delivers each customer. there is, therefore as many rows as customers and as many columns as warehouses. The matrix is called y. y[c][w] is 1 if waregouse w delivers customer c, 0 otherwise.
So far so good. This is supossed to be an MIP problem. To code it, I do it un Python using the PuPL lib(https://pythonhosted.org/PuLP/pulp.html) and the GLPK to solve it.
Now, here is my model:
#!/usr/bin/python
# -*- coding: utf-8 -*-
from pulp import *
def solveIt(inputData):
# parse the input
lines = inputData.split('\n')
parts = lines[0].split()
warehouseCount = int(parts[0])
customerCount = int(parts[1])
warehouses = []
for i in range(1, warehouseCount+1):
line = lines[i]
parts = line.split()
warehouses.append((int(parts[0]), float(parts[1])))
customerDemands = []
customerCosts = []
lineIndex = warehouseCount+1
for i in range(0, customerCount):
customerDemand = int(lines[lineIndex+2*i])
customerCost = map(float, lines[lineIndex+2*i+1].split())
customerDemands.append(customerDemand)
customerCosts.append(customerCost)
x = [LpVariable("x"+str(w),0,1,cat='Integer') for w in range(0,warehouseCount)]
y = [[LpVariable("y"+str(c)+","+str(w),0,1,cat='Integer') for w in range(0,warehouseCount)] for c in range(0,customerCount)]
prob = LpProblem("Warehouse Location",LpMinimize)
#Constraints
# y_cw <= x_w makes sure that no client is delivered by a closed warehouse
for w in range(0,warehouseCount):
for c in range(0,customerCount):
prob += y[c][w] <= x[w]
#A client is served by exactly one warehouse
for c in range(0,customerCount):
affineExpression = []
for w in range(0,warehouseCount):
affineExpression.append((y[c][w],1))
prob += LpAffineExpression(affineExpression) == 1
#For each warehouse, the sum of demand of all the clients it serves is lower than its capacity
for w in range(0,warehouseCount):
affineExpression = []
for c in range(0,customerCount):
affineExpression.append((y[c][w],customerDemands[c]))
prob += LpAffineExpression(affineExpression) <= warehouses[w][0]
#Objective
#The sum of all the warehouses opening plus the transportation costs has to be minimal
affineExpression = []
for w in range(0,warehouseCount):
affineExpression.append((x[w],warehouses[w][1]))
for c in range(0,customerCount):
affineExpression.append((y[c][w],customerCosts[c][w]))
prob += LpAffineExpression(affineExpression)
print "#######################START SOLVING"
status = prob.solve(GLPK(mip=1,msg = 1))
print LpStatus[status]
for w in range(0,warehouseCount):
print value(x[w])
solution = []
for c in range(0,customerCount):
string = ""
whichOne = -1
for w in range(0,warehouseCount):
string += str(value(y[c][w])) + " "
if value(y[c][w]) == 1:
whichOne = w
solution.append(w)
print string+ " "+str(whichOne)
# calculate the cost of the solution
obj = sum([warehouses[x][1]*x[w] for x in range(0,warehouseCount)])
for c in range(0, customerCount):
obj += customerCosts[c][solution[c]]
# prepare the solution in the specified output format
outputData = str(obj) + ' ' + str(0) + '\n'
outputData += ' '.join(map(str, solution))
return outputData
I know the way I build the matrix is not optimal, but it really isnt taking too long. It started solving and at some point I reached a point where GLPK said:
OPTIMAL SOLUTION FOUND
Integer optimization begins...
I believe that means it solved the LP, and now its getting it integer... but its been like 6 hours or so and its been progressing, and still is, but doesn't finish. In smaller instances, it worked fine.
My question, I guess is... Is there something wrong with my model? Some optimizations I forgot? OR is this problem just that huge?
Also, about the computer, its quite a poor one: Intel Atom and 1GB of RAM only...
Thank you for your help!
EDIT: Here is the date: https://github.com/ddeunagomez/DiscreteOptimization/blob/master/04_warehouse_location/warehouse/data/wl_100_1 the format is:
NumberOfWarehouses NumberOfCustomers
CapacityWarehouse1 OpeningCostWarehouse1
CapacityWarehouse2 OpeningCostWarehouse2
.....
CapacityWarehouseN OpeningCostWarehouseN
DemandCustomer1
TransportCostW1_C1 TransportCostW2_C1 ....... TransportCostWN_C1
DemandCustomer2
TransportCostW1_C2 TransportCostW2_C2 ....... TransportCostWN_C2
.....
DemandCustomerN
TransportCostW1_CM TransportCostW2_CM ....... TransportCostWN_CM
This is not really a huge problem in the scheme of things -- there is specialized code to solve much larger warehouse location instances than this one, and good off-the-shelf solvers like CPLEX could solve it easily too. I don't know how efficient GLPK/PuPL are, but it could well be that they just take too long using straightforward LP/branch-and-bound (which is what they are doing).
One thing you could try is to allow the y variables to be continuous (0 <= y <= 1) instead of binary. This will most likely speed up the run times because the solver won't have to branch on them. The physical interpretation is that some customers can have their demands split between multiple warehouses. In practice, very few of them will probably be split in most solutions. If the capacities are large enough to be non-binding, then none of the demands will be split, and you'll always get binary solutions even though you allow y to be continuous.