Location Intelligence Problem and Applications ¶
Location Intelligence/Set Covering Problem¶
The Location Intelligence problem is a classic optimization problem that has numerous practical applications, including facility location, scheduling, and resource allocation, across various industries and domains.
Given a finite set of elements and a collection of subsets that cover these elements, the objective is to identify the smallest possible subset of subsets that cover all the elements. However, this problem is known to be NP-hard, implying that finding an optimal solution can be computationally challenging for large instances.
Applications¶
Resource Allocation: the objective is to allocate limited resources to various tasks or projects in the most efficient manner. A classic example of this is the employee assignment problem, where a company must decide which employees to assign to different projects, ensuring that all necessary skills are covered while minimizing the total number of employees involved.
Facility Location: Facility location problems (location set covering problems), which involve determining the optimal placement of facilities to serve a specific set of customers or demand points, can be effectively addressed using set covering models. For example, a logistics company seeking to minimize transportation costs and ensure timely delivery to customers may use set covering to identify the most cost-effective locations for its distribution centers. By formulating the facility location problem as a set covering model, the company can optimize its facility placement strategy, balancing the trade-offs between transportation costs, customer demand, and facility capacity.
Network Design: require determining the optimal configuration of a network subject to specific constraints and objectives, can be effectively solved using set covering techniques. For instance, a telecommunications company seeking to establish a network that connects multiple cities while minimizing cable length and infrastructure costs can leverage set covering to optimize its network design. By modeling the network design problem as a set covering problem, the company can identify the most efficient network layout, reducing costs and improving overall network performance.
Computational Challenges¶
Computational complexity and scalability One of the primary challenges in set covering optimization is the high computational complexity that arises when tackling large-scale instances. The scale of the problem can be massive, with the number of elements and sets often reaching into the millions or even billions, posing significant computational challenges As the problem size grows, with an increasing number of elements and sets, the computational burden escalates exponentially, making it increasingly difficult to obtain optimal solutions within a reasonable timeframe.
Data Sensitivity The performance of set covering optimization algorithms can be extremely sensitive to even minor variations in the input data. For instance, the addition or removal of a single element or set can drastically alter the solution quality, highlighting the algorithm's vulnerability to data perturbations. This sensitivity is particularly problematic when working with real-world data, which often contains errors, uncertainties, or noise, making it essential to develop robust algorithms that can handle such imperfections.
The Coverage-Cost Trade-off delicate balancing act between maximizing coverage and minimizing costs. In many real-world applications, achieving 100% coverage may not be economically viable due to resource constraints, forcing decision-makers to strike a compromise between coverage and cost. This trade-off is particularly challenged in domains such as facility location planning and sensor network deployment, where the optimal solution must balance competing objectives. For example, a city planning authority must carefully weigh the costs and benefits of different fire station locations to ensure adequate coverage while minimizing expenses.
Install libraries¶
import geopandas
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import matplotlib.lines as mlines
import numpy as np
import shapely
import spopt
from spopt.locate import LSCP,MCLP,PMedian,PCenter,simulated_geo_points
from spopt.locate.p_dispersion import PDispersion
import spaghetti
from pulp import *
import matplotlib_scalebar
from matplotlib_scalebar.scalebar import ScaleBar
import pandas
import pulp
from shapely.geometry import Point
from spopt.locate import PDispersion
import time
import warnings
warnings.filterwarnings("ignore")
import libpysal as ps
from scipy.spatial.distance import cdist
Location Set Covering Problem ¶
Implement the Location Set Covering Problem (LSCP) optimization model and solve it. A capacitated version, the Capacitated Location Set Covering Problem – System Optimal (CLSCP-SO), can also be solved by passing in client client demand and facility capacities.
The standard LSCP can be formulated as:
\begin{split}\begin{array}{lllll} \displaystyle \textbf{Minimize} & \displaystyle \sum_{j \in J}{Y_j} && & (1) \\ \displaystyle \textbf{Subject To} & \displaystyle \sum_{j \in N_i}{Y_j} \geq 1 && \forall i \in I & (2) \\ & Y_j \in \{0,1\} && \forall j \in J & (3) \\ & && & \\ \displaystyle \textbf{Where} && i & = & \textrm{index of demand points/areas/objects in set } I \\ && j & = & \textrm{index of potential facility sites in set } J \\ && S & = & \textrm{maximum acceptable service distance or time standard} \\ && d_{ij} & = & \textrm{shortest distance or travel time between locations } i \textrm{ and } j \\ && N_i & = & \{j | d_{ij} < S\} \\ && Y_j & = & \begin{cases} 1, \textrm{if a facility is sited at location } j \\ 0, \textrm{otherwise} \\ \end{cases} \end{array}\end{split}
The CLSCP-SO can be formulated as: \begin{split}\begin{array}{lllll} \displaystyle \textbf{Minimize} & \displaystyle \sum_{j \in J}{Y_j} && & (1) \\ \displaystyle \textbf{Subject to} & \displaystyle \sum_{j \in N_i}{z_{ij}} = 1 && \forall i \in I & (2) \\ & \displaystyle \sum_{i \in I} a_i z_{ij} \leq C_jx_j && \forall j \in J & (3) \\ & Y_j \in {0,1} && \forall j \in J & (4) \\ & z_{ij} \geq 0 && \forall i \in I \quad \forall j \in N_i & (5) \\ & && & \\ \displaystyle \textbf{Where:} && i & = & \textrm{index of demand points/areas/objects in set } I \\ && j & = & \textrm{index of potential facility sites in set } J \\ && S & = & \textrm{maximal acceptable service distance or time standard} \\ && d_{ij} & = & \textrm{shortest distance or travel time between nodes } i \textrm{ and } j \\ && N_i & = & \{j | d_{ij} < S\} \\ && a_i & = & \textrm{amount of demand at } i \\ && C_j & = & \textrm{capacity of potential facility } j \\ && z_{ij} & = & \textrm{fraction of demand } i \textrm{ that is assigned to facility } j \\ && Y_j & = & \begin{cases} 1, \text{if a facility is located at node } j \\ 0, \text{otherwise} \\ \end{cases} \end{array}\end{split}
Random Generated Instance¶
# quantity demand points
CLIENT_COUNT = 100
# quantity supply points
FACILITY_COUNT = 10
# maximum service radius (in distance units)
SERVICE_RADIUS = 8
# random seeds for reproducibility
CLIENT_SEED = 5
FACILITY_SEED = 6
# set the solver
solver = pulp.PULP_CBC_CMD(msg=1)
lattice = spaghetti.regular_lattice((0, 0, 10, 10), 9, exterior=True)
ntw = spaghetti.Network(in_data=lattice)
streets = spaghetti.element_as_gdf(ntw, arcs=True)
streets_buffered = geopandas.GeoDataFrame(
geopandas.GeoSeries(streets["geometry"].buffer(0.5).unary_union),
crs=streets.crs,
columns=["geometry"],
)
#streets.plot();
client_points = simulated_geo_points(
streets_buffered, needed=CLIENT_COUNT, seed=CLIENT_SEED
)
facility_points = simulated_geo_points(
streets_buffered, needed=FACILITY_COUNT, seed=FACILITY_SEED
)
fig, ax = plt.subplots(figsize=(6, 6))
streets.plot(ax=ax, alpha=0.8, zorder=1, label="streets")
facility_points.plot(
ax=ax, color="red", zorder=2, label=f"facility candidate sites ($n$={FACILITY_COUNT})"
)
client_points.plot(ax=ax, color="black", label=f"clients sites ($n$={CLIENT_COUNT})")
plt.legend(loc="upper left", bbox_to_anchor=(1.05, 1));
ntw.snapobservations(client_points, "clients", attribute=True)
clients_snapped = spaghetti.element_as_gdf(ntw, pp_name="clients", snapped=True)
clients_snapped.drop(columns=["id", "comp_label"], inplace=True)
ntw.snapobservations(facility_points, "facilities", attribute=True)
facilities_snapped = spaghetti.element_as_gdf(ntw, pp_name="facilities", snapped=True)
facilities_snapped.drop(columns=["id", "comp_label"], inplace=True)
fig, ax = plt.subplots(figsize=(6, 6))
streets.plot(ax=ax, alpha=0.8, zorder=1, label="streets")
facilities_snapped.plot(
ax=ax, color="red", zorder=2, label=f"facility candidate sites ($n$={FACILITY_COUNT})"
)
clients_snapped.plot(ax=ax, color="black", label=f"clients sites ($n$={CLIENT_COUNT})")
plt.legend(loc="upper left", bbox_to_anchor=(1.05, 1));
Calculating the (network distance) cost matrix¶
cost_matrix = ntw.allneighbordistances(
sourcepattern=ntw.pointpatterns["clients"],
destpattern=ntw.pointpatterns["facilities"],
)
lscp_from_cm = LSCP.from_cost_matrix(
cost_matrix, SERVICE_RADIUS, name="LSCP-network-distance"
)
lscp_from_cm = lscp_from_cm.solve(solver)
lscp_from_cm
facility_points["dv"] = lscp_from_cm.fac_vars
facility_points["dv"] = facility_points["dv"].map(lambda x: x.name.replace("_", ""))
facilities_snapped["dv"] = facility_points["dv"]
facility_points
geometry | dv | |
---|---|---|
0 | POINT (9.32146 3.15178) | y0 |
1 | POINT (8.53352 -0.04134) | y1 |
2 | POINT (0.68422 6.04557) | y2 |
3 | POINT (5.32799 4.10688) | y3 |
4 | POINT (3.18949 6.34771) | y4 |
5 | POINT (4.31956 7.5947) | y5 |
6 | POINT (5.1984 5.86744) | y6 |
7 | POINT (6.59891 10.39247) | y7 |
8 | POINT (8.51844 4.04521) | y8 |
9 | POINT (9.13894 8.56135) | y9 |
Calculating euclidean distance from a GeoDataFrame¶
distance_metric = "euclidean"
lscp_from_gdf = LSCP.from_geodataframe(
clients_snapped,
facilities_snapped,
"geometry",
"geometry",
SERVICE_RADIUS,
distance_metric=distance_metric,
name=f"lscp-{distance_metric}-distance"
)
lscp_from_gdf = lscp_from_gdf.solve(solver)
facility_points["predefined_loc"] = 0
facility_points.loc[(4, 9), "predefined_loc"] = 1
facilities_snapped["predefined_loc"] = facility_points["predefined_loc"]
facility_points
geometry | dv | predefined_loc | |
---|---|---|---|
0 | POINT (9.32146 3.15178) | y0 | 0 |
1 | POINT (8.53352 -0.04134) | y1 | 0 |
2 | POINT (0.68422 6.04557) | y2 | 0 |
3 | POINT (5.32799 4.10688) | y3 | 0 |
4 | POINT (3.18949 6.34771) | y4 | 1 |
5 | POINT (4.31956 7.5947) | y5 | 0 |
6 | POINT (5.1984 5.86744) | y6 | 0 |
7 | POINT (6.59891 10.39247) | y7 | 0 |
8 | POINT (8.51844 4.04521) | y8 | 0 |
9 | POINT (9.13894 8.56135) | y9 | 1 |
Calculating euclidean distance with preselected facilities (euclidean distance)¶
lscp_from_gdf_pre = LSCP.from_geodataframe(
clients_snapped,
facilities_snapped,
"geometry",
"geometry",
SERVICE_RADIUS,
distance_metric=distance_metric,
predefined_facility_col="predefined_loc",
name=f"lscp-{distance_metric}-distance-predefined"
)
lscp_from_gdf_pre = lscp_from_gdf_pre.solve(solver)
dv_colors_arr = [
"darkcyan",
"mediumseagreen",
"saddlebrown",
"darkslategray",
"lightskyblue",
"thistle",
"lavender",
"darkgoldenrod",
"peachpuff",
"coral",
"mediumvioletred",
"blueviolet",
"fuchsia",
"cyan",
"limegreen",
"mediumorchid",
]
dv_colors = {f"y{i}": dv_colors_arr[i] for i in range(len(dv_colors_arr))}
def plot_results(model, p, facs, clis=None, ax=None):
"""Visualize optimal solution sets and context."""
if not ax:
multi_plot = False
fig, ax = plt.subplots(figsize=(6, 6))
markersize, markersize_factor = 4, 4
else:
ax.axis("off")
multi_plot = True
markersize, markersize_factor = 2, 2
ax.set_title(model.name, fontsize=15)
# extract facility-client relationships for plotting (except for p-dispersion)
plot_clis = isinstance(clis, geopandas.GeoDataFrame)
if plot_clis:
cli_points = {}
fac_sites = {}
for i, dv in enumerate(model.fac_vars):
if dv.varValue:
dv, predef = facs.loc[i, ["dv", "predefined_loc"]]
fac_sites[dv] = [i, predef]
if plot_clis:
geom = clis.iloc[model.fac2cli[i]]["geometry"]
cli_points[dv] = geom
# study area and legend entries initialization
streets.plot(ax=ax, alpha=1, color="black", zorder=1)
legend_elements = [mlines.Line2D([], [], color="black", label="streets")]
if plot_clis:
# any clients that not asscociated with a facility
if model.name.startswith("mclp"):
c = "k"
if model.n_cli_uncov:
idx = [i for i, v in enumerate(model.cli2fac) if len(v) == 0]
pnt_kws = dict(ax=ax, fc=c, ec=c, marker="s", markersize=4, zorder=2)
clis.iloc[idx].plot(**pnt_kws)
_label = f"Demand sites not covered ($n$={model.n_cli_uncov})"
_mkws = dict(marker="s", markerfacecolor=c, markeredgecolor=c, linewidth=0)
legend_elements.append(mlines.Line2D([], [], ms=3, label=_label, **_mkws))
# all candidate facilities
facs.plot(ax=ax, fc="brown", marker="*", markersize=100, zorder=8)
_label = f"Facility sites ($n$={len(model.fac_vars)})"
_mkws = dict(marker="*", markerfacecolor="brown", markeredgecolor="brown")
legend_elements.append(mlines.Line2D([], [], ms=7, lw=0, label=_label, **_mkws))
# facility-(client) symbology and legend entries
zorder = 4
for fname, (fac, predef) in fac_sites.items():
cset = dv_colors[fname]
if plot_clis:
# clients
geoms = cli_points[fname]
gdf = geopandas.GeoDataFrame(geoms)
gdf.plot(ax=ax, zorder=zorder, ec="k", fc=cset, markersize=20 * markersize)
_label = f"Demand sites covered by {fname}"
_mkws = dict(markerfacecolor=cset, markeredgecolor="k", ms=markersize + 7)
legend_elements.append(
mlines.Line2D([], [], marker="o", lw=0, label=_label, **_mkws)
)
# facilities
ec = "k"
lw = 2
predef_label = "predefined"
if model.name.endswith(predef_label) and predef:
ec = "r"
lw = 3
fname += f" ({predef_label})"
facs.iloc[[fac]].plot(
ax=ax, marker="*", markersize=400, zorder=9, fc=cset, ec=ec, lw=lw
)
_mkws = dict(markerfacecolor=cset, markeredgecolor=ec, markeredgewidth=lw)
legend_elements.append(
mlines.Line2D([], [], marker="*", ms=20, lw=0, label=fname, **_mkws)
)
# increment zorder up and markersize down for stacked client symbology
zorder += 1
if plot_clis:
markersize -= markersize_factor / p
if not multi_plot:
# legend
kws = dict(loc="upper left", bbox_to_anchor=(1.05, 0.7))
plt.legend(handles=legend_elements, **kws)
LSCP built from cost matrix (network distance)¶
plot_results(
lscp_from_cm,
lscp_from_cm.problem.objective.value(),
facility_points,
clis=client_points
)
LSCP built from geodataframe (euclidean distance)¶
plot_results(
lscp_from_gdf,
lscp_from_gdf.problem.objective.value(),
facility_points,
clis=client_points
)
LSCP with preselected facilities (euclidean distance)¶
plot_results(
lscp_from_gdf_pre,
lscp_from_gdf_pre.problem.objective.value(),
facility_points,
clis=client_points
)
Comparing solution from varied metrics¶
fig, axarr = plt.subplots(1, 3, figsize=(20, 10))
fig.subplots_adjust(wspace=-0.01)
for i, m in enumerate([lscp_from_cm, lscp_from_gdf, lscp_from_gdf_pre]):
plot_results(
m, m.problem.objective.value(), facility_points, clis=client_points, ax=axarr[i]
)
Maximal Coverage Location Problem ¶
Maximal Coverage Location Problem (MCLP) can be formulated as:
\begin{split}\begin{array}{lllll} \displaystyle \textbf{Maximize} & \displaystyle \sum_{i \in I}{a_iX_i} && & (1) \\ \displaystyle \textbf{Subject To} & \displaystyle \sum_{j \in N_i}{Y_j \geq X_i} && \forall i \in I & (2) \\ & \displaystyle \sum_{j \in J}{Y_j} = p && & (3) \\ & X_i \in \{0, 1\} && \forall i \in I & (4) \\ & Y_j \in \{0, 1\} && \forall j \in J & (5) \\ & && & \\ \displaystyle \textbf{Where} && i & = & \textrm{index of demand points/areas/objects in set } I \\ && j & = & \textrm{index of potential facility sites in set } J \\ && p & = & \textrm{the number of facilities to be sited} \\ && S & = & \textrm{maximum acceptable service distance or time standard} \\ && d_{ij} & = & \textrm{shortest distance or travel time between locations } i \textrm{ and } j \\ && N_i & = & \{j | d_{ij} < S\} \\ && X_i & = & \begin{cases} 1, \textrm{if client location } i \textrm{ is covered within service standard } S \\ 0, \textrm{otherwise} \\ \end{cases} \\ && Y_j & = & \begin{cases} 1, \textrm{if a facility is sited at location } j \\ 0, \textrm{otherwise} \\ \end{cases} \end{array}\end{split}
lattice = spaghetti.regular_lattice((0, 0, 10, 10), 9, exterior=True)
ntw = spaghetti.Network(in_data=lattice)
streets = spaghetti.element_as_gdf(ntw, arcs=True)
streets_buffered = geopandas.GeoDataFrame(
geopandas.GeoSeries(streets["geometry"].buffer(0.2).unary_union),
crs=streets.crs,
columns=["geometry"]
)
demand_points = simulated_geo_points(streets_buffered, needed=100, seed=5)
facility_points = simulated_geo_points(streets_buffered, needed=10, seed=6)
ntw.snapobservations(demand_points, "clients", attribute=True)
clients_snapped = spaghetti.element_as_gdf(
ntw, pp_name="clients", snapped=True
)
ntw.snapobservations(facility_points, "facilities", attribute=True)
facilities_snapped = spaghetti.element_as_gdf(
ntw, pp_name="facilities", snapped=True
)
ai = np.random.randint(1, 12, 100)
clients_snapped['weights'] = ai
mclp_from_geodataframe = MCLP.from_geodataframe(
clients_snapped,
facilities_snapped,
"geometry",
"geometry",
"weights",
service_radius=7,
p_facilities=10,
distance_metric="euclidean"
)
mclp_from_geodataframe = mclp_from_geodataframe.solve(
pulp.PULP_CBC_CMD(msg=False)
)
for fac, cli in enumerate(mclp_from_geodataframe.fac2cli):
print(f"facility {fac} serving {len(cli)} clients")
facility 0 serving 63 clients facility 1 serving 80 clients facility 2 serving 96 clients facility 3 serving 99 clients facility 4 serving 58 clients facility 5 serving 75 clients facility 6 serving 55 clients facility 7 serving 70 clients facility 8 serving 86 clients facility 9 serving 56 clients
P-Median Problem ¶
The p-median problem can be formulated as:
\begin{split}\begin{array}{lllll} \displaystyle \textbf{Minimize} & \displaystyle \sum_{i \in I}\sum_{j \in J}{a_i d_{ij} X_{ij}} && & (1) \\ \displaystyle \textbf{Subject To} & \displaystyle \sum_{j \in J}{X_{ij} = 1} && \forall i \in I & (2) \\ & \displaystyle \sum_{j \in J}{Y_j} = p && & (3) \\ & X_{ij} \leq Y_{j} && \forall i \in I \quad \forall j \in J & (4) \\ & X_{ij} \in \{0, 1\} && \forall i \in I \quad \forall j \in J & (5) \\ & Y_j \in \{0, 1\} && \forall j \in J & (6) \\ & && & \\ \displaystyle \textbf{Where} && i & = & \textrm{index of demand points/areas/objects in set } I \\ && j & = & \textrm{index of potential facility sites in set } J \\ && p & = & \textrm{the number of facilities to be sited} \\ && a_i & = & \textrm{service load or population demand at client location } i \\ && d_{ij} & = & \textrm{shortest distance or travel time between locations } i \textrm{ and } j \\ && X_{ij} & = & \begin{cases} 1, \textrm{if client location } i \textrm{ is served by facility } j \\ 0, \textrm{otherwise} \\ \end{cases} \\ && Y_j & = & \begin{cases} 1, \textrm{if a facility is sited at location } j \\ 0, \textrm{otherwise} \\ \end{cases} \\ \end{array}\end{split}
lattice = spaghetti.regular_lattice((0, 0, 10, 10), 9, exterior=True)
ntw = spaghetti.Network(in_data=lattice)
streets = spaghetti.element_as_gdf(ntw, arcs=True)
streets_buffered = geopandas.GeoDataFrame(
geopandas.GeoSeries(streets["geometry"].buffer(0.2).unary_union),
crs=streets.crs,
columns=["geometry"]
)
demand_points = simulated_geo_points(streets_buffered, needed=100, seed=5)
facility_points = simulated_geo_points(streets_buffered, needed=5, seed=6)
ntw.snapobservations(demand_points, "clients", attribute=True)
clients_snapped = spaghetti.element_as_gdf(
ntw, pp_name="clients", snapped=True
)
ntw.snapobservations(facility_points, "facilities", attribute=True)
facilities_snapped = spaghetti.element_as_gdf(
ntw, pp_name="facilities", snapped=True
)
Calculate the cost matrix from origins to destinations.
cost_matrix = ntw.allneighbordistances(
sourcepattern=ntw.pointpatterns["clients"],
destpattern=ntw.pointpatterns["facilities"]
)
ai = np.random.randint(1, 12, 100)
clients_snapped['weights'] = ai
pmedian_from_geodataframe = PMedian.from_geodataframe(
clients_snapped,
facilities_snapped,
"geometry",
"geometry",
"weights",
p_facilities=4,
distance_metric="euclidean"
)
pmedian_from_geodataframe = pmedian_from_geodataframe.solve(
pulp.PULP_CBC_CMD(msg=False)
)
for fac, cli in enumerate(pmedian_from_geodataframe.fac2cli):
print(f"facility {fac} serving {len(cli)} clients")
facility 0 serving 13 clients facility 1 serving 29 clients facility 2 serving 31 clients facility 3 serving 0 clients facility 4 serving 27 clients
P-Center Model ¶
The P-center problem can be formulated as:
\begin{split}\begin{array}{lllll} \displaystyle \textbf{Minimize} & \displaystyle W && & (1) \\ \displaystyle \textbf{Subject To} & \displaystyle \sum_{j \in J}{X_{ij} = 1} && \forall i \in I & (2) \\ & \displaystyle \sum_{j \in J}{Y_j} = p && & (3) \\ & X_{ij} \leq Y_{j} && \forall i \in I \quad \forall j \in J & (4) \\ & W \geq \displaystyle \sum_{j \in J}{d_{ij}X_{ij}} && \forall i \in I & (5) \\ & X_{ij} \in \{0, 1\} && \forall i \in I \quad \forall j \in J & (6) \\ & Y_j \in \{0, 1\} && \forall j \in J & (7) \\ & && & \\ \displaystyle \textbf{Where} && i & = & \textrm{index of demand points/areas/objects in set } I \\ && j & = & \textrm{index of potential facility sites in set } J \\ && p & = & \textrm{the number of facilities to be sited} \\ && d_{ij} & = & \textrm{shortest distance or travel time between locations } i \textrm{ and } j \\ && X_{ij} & = & \begin{cases} 1, \textrm{if client location } i \textrm{ is served by facility } j \\ 0, \textrm{otherwise} \\ \end{cases} \\ && Y_j & = & \begin{cases} 1, \textrm{if a facility is sited at location } j \\ 0, \textrm{otherwise} \\ \end{cases} \\ && W & = & \textrm{maximum distance between any demand site and its associated facility} \end{array}\end{split}
lattice = spaghetti.regular_lattice((0, 0, 10, 10), 9, exterior=True)
ntw = spaghetti.Network(in_data=lattice)
streets = spaghetti.element_as_gdf(ntw, arcs=True)
streets_buffered = geopandas.GeoDataFrame(
geopandas.GeoSeries(streets["geometry"].buffer(0.2).unary_union),
crs=streets.crs,
columns=["geometry"]
)
demand_points = simulated_geo_points(streets_buffered, needed=100, seed=5)
facility_points = simulated_geo_points(streets_buffered, needed=5, seed=6)
ntw.snapobservations(demand_points, "clients", attribute=True)
clients_snapped = spaghetti.element_as_gdf(
ntw, pp_name="clients", snapped=True
)
ntw.snapobservations(facility_points, "facilities", attribute=True)
facilities_snapped = spaghetti.element_as_gdf(
ntw, pp_name="facilities", snapped=True
)
pcenter_from_geodataframe = PCenter.from_geodataframe(
clients_snapped,
facilities_snapped,
"geometry",
"geometry",
p_facilities=4,
distance_metric="euclidean"
)
pcenter_from_geodataframe = pcenter_from_geodataframe.solve(
pulp.PULP_CBC_CMD(msg=False)
)
for fac, cli in enumerate(pcenter_from_geodataframe.fac2cli):
print(f"facility {fac} serving {len(cli)} clients")
facility 0 serving 14 clients facility 1 serving 26 clients facility 2 serving 34 clients facility 3 serving 0 clients facility 4 serving 26 clients
P-Dispersion Model ¶
The p-dispersion problem can be formulated as: \begin{split}\begin{array}{lllll} \displaystyle \textbf{Maximize} & \displaystyle D && & (1) \\ \displaystyle \textbf{Subject To} & \displaystyle \sum_{i \in I}{Y_i} = p && & (2) \\ & D \leq d_{ij} + M (2 - Y_{i} - Y_{j}) && \forall i \in I \quad \forall j > i & (3) \\ & Y_i \in \{0, 1\} && \forall i \in I & (4) \\ & && & \\ \displaystyle \textbf{Where} && i, j & = & \textrm{index of potential facility sites in set } I \\ && p & = & \textrm{the number of facilities to be sited} \\ && d_{ij} & = & \textrm{shortest distance or travel time between locations } i \textrm{ and } j \\ && D & = & \textrm{minimum distance between any two sited facilities } i \textrm{ and } j \\ && M & = & \textrm{some large number; such that } M \geq \max_{ij}\{d_{ij}\} \\ && Y_i & = & \begin{cases} 1, \textrm{if a facility is sited at location } i \\ 0, \textrm{otherwise} \\ \end{cases} \\ \end{array}\end{split}
lattice = spaghetti.regular_lattice((0, 0, 10, 10), 9, exterior=True)
ntw = spaghetti.Network(in_data=lattice)
streets = spaghetti.element_as_gdf(ntw, arcs=True)
streets_buffered = geopandas.GeoDataFrame(
geopandas.GeoSeries(streets["geometry"].buffer(0.2).unary_union),
crs=streets.crs,
columns=["geometry"]
)
facility_points = simulated_geo_points(streets_buffered, needed=5, seed=6)
ntw.snapobservations(facility_points, "facilities", attribute=True)
facilities_snapped = spaghetti.element_as_gdf(
ntw, pp_name="facilities", snapped=True
)
pdispersion_from_geodataframe = PDispersion.from_geodataframe(
facilities_snapped,
"geometry",
p_facilities=5,
distance_metric="euclidean"
)
pdispersion_from_geodataframe = pdispersion_from_geodataframe.solve(
pulp.PULP_CBC_CMD(msg=False)
)
for dv in pdispersion_from_geodataframe.fac_vars:
if dv.varValue:
print(f"facility {dv.name} is selected")
facility y_0_ is selected facility y_1_ is selected facility y_2_ is selected facility y_3_ is selected facility y_4_ is selected
Comparison of Different Models ¶
Example using SF census data (GIS shape input)¶
DIRPATH = "./data/"
facility_points = pandas.read_csv(DIRPATH + "SF_store_site_16_longlat.csv", index_col=0)
facility_points = facility_points.reset_index(drop=True)
facility_points
OBJECTID | NAME | long | lat | |
---|---|---|---|---|
0 | 1 | Store_1 | -122.510018 | 37.772364 |
1 | 2 | Store_2 | -122.488873 | 37.753764 |
2 | 3 | Store_3 | -122.464927 | 37.774727 |
3 | 4 | Store_4 | -122.473945 | 37.743164 |
4 | 5 | Store_5 | -122.449291 | 37.731545 |
5 | 6 | Store_6 | -122.491745 | 37.649309 |
6 | 7 | Store_7 | -122.483182 | 37.701109 |
7 | 8 | Store_11 | -122.433782 | 37.655364 |
8 | 9 | Store_12 | -122.438982 | 37.719236 |
9 | 10 | Store_13 | -122.440218 | 37.745382 |
10 | 11 | Store_14 | -122.421636 | 37.742964 |
11 | 12 | Store_15 | -122.430982 | 37.782964 |
12 | 13 | Store_16 | -122.426873 | 37.769291 |
13 | 14 | Store_17 | -122.432345 | 37.805218 |
14 | 15 | Store_18 | -122.412818 | 37.805745 |
15 | 16 | Store_19 | -122.398909 | 37.797073 |
study_area = geopandas.read_file(DIRPATH + "ServiceAreas_4.shp").dissolve()
network_distance = pandas.read_csv(
DIRPATH + "SF_network_distance_candidateStore_16_censusTract_205_new.csv"
)
demand_points = pandas.read_csv(
DIRPATH + "SF_demand_205_centroid_uniform_weight.csv", index_col=0
)
demand_points = demand_points.reset_index(drop=True)
base = study_area.plot()
base.axis("off");
ntw_dist_piv = network_distance.pivot_table(
values="distance", index="DestinationName", columns="name"
)
cost_matrix = ntw_dist_piv.to_numpy()
n_dem_pnts = demand_points.shape[0]
n_fac_pnts = facility_points.shape[0]
process = lambda df: as_gdf(df).sort_values(by=["NAME"]).reset_index(drop=True)
as_gdf = lambda df: geopandas.GeoDataFrame(df, geometry=pnts(df))
pnts = lambda df: geopandas.points_from_xy(df.long, df.lat)
facility_points = process(facility_points)
demand_points = process(demand_points)
for _df in [facility_points, demand_points, study_area]:
_df.set_crs("EPSG:4326", inplace=True)
_df.to_crs("EPSG:7131", inplace=True)
ai = demand_points["POP2000"].to_numpy()
# maximum service radius (in meters)
SERVICE_RADIUS = 5000
# number of candidate facilities in optimal solution
P_FACILITIES = 8
dv_colors_arr = [
"darkcyan",
"mediumseagreen",
"saddlebrown",
"darkslategray",
"lightskyblue",
"thistle",
"lavender",
"darkgoldenrod",
"peachpuff",
"coral",
"mediumvioletred",
"blueviolet",
"fuchsia",
"cyan",
"limegreen",
"mediumorchid",
]
dv_colors = {f"y{i}": dv_colors_arr[i] for i in range(len(dv_colors_arr))}
def plot_results(model, p, facs, clis=None, ax=None):
"""Visualize optimal solution sets and context."""
if not ax:
multi_plot = False
fig, ax = plt.subplots(figsize=(6, 9))
markersize, markersize_factor = 4, 4
else:
ax.axis("off")
multi_plot = True
markersize, markersize_factor = 2, 2
ax.set_title(model.name, fontsize=15)
# extract facility-client relationships for plotting (except for p-dispersion)
plot_clis = isinstance(clis, geopandas.GeoDataFrame)
if plot_clis:
cli_points = {}
fac_sites = {}
for i, dv in enumerate(model.fac_vars):
if dv.varValue:
dv, predef = facs.loc[i, ["dv", "predefined_loc"]]
fac_sites[dv] = [i, predef]
if plot_clis:
geom = clis.iloc[model.fac2cli[i]]["geometry"]
cli_points[dv] = geom
# study area and legend entries initialization
study_area.plot(ax=ax, alpha=0.5, fc="tan", ec="k", zorder=1)
_patch = Patch(alpha=0.5, fc="tan", ec="k", label="Dissolved Service Areas")
legend_elements = [_patch]
if plot_clis:
# any clients that not asscociated with a facility
if model.name.startswith("mclp"):
c = "k"
if model.n_cli_uncov:
idx = [i for i, v in enumerate(model.cli2fac) if len(v) == 0]
pnt_kws = dict(ax=ax, fc=c, ec=c, marker="s", markersize=7, zorder=2)
clis.iloc[idx].plot(**pnt_kws)
_label = f"Demand sites not covered ($n$={model.n_cli_uncov})"
_mkws = dict(marker="s", markerfacecolor=c, markeredgecolor=c, linewidth=0)
legend_elements.append(mlines.Line2D([], [], ms=3, label=_label, **_mkws))
# all candidate facilities
facs.plot(ax=ax, fc="brown", marker="*", markersize=80, zorder=8)
_label = f"Facility sites ($n$={len(model.fac_vars)})"
_mkws = dict(marker="*", markerfacecolor="brown", markeredgecolor="brown")
legend_elements.append(mlines.Line2D([], [], ms=7, lw=0, label=_label, **_mkws))
# facility-(client) symbology and legend entries
zorder = 4
for fname, (fac, predef) in fac_sites.items():
cset = dv_colors[fname]
if plot_clis:
# clients
geoms = cli_points[fname]
gdf = geopandas.GeoDataFrame(geoms)
gdf.plot(ax=ax, zorder=zorder, ec="k", fc=cset, markersize=20 * markersize)
_label = f"Demand sites covered by {fname}"
_mkws = dict(markerfacecolor=cset, markeredgecolor="k", ms=markersize + 7)
legend_elements.append(
mlines.Line2D([], [], marker="o", lw=0, label=_label, **_mkws)
)
# facilities
ec = "k"
lw = 2
predef_label = "predefined"
if model.name.endswith(predef_label) and predef:
ec = "r"
lw = 3
fname += f" ({predef_label})"
facs.iloc[[fac]].plot(
ax=ax, marker="*", markersize=400, zorder=9, fc=cset, ec=ec, lw=lw
)
_mkws = dict(markerfacecolor=cset, markeredgecolor=ec, markeredgewidth=lw)
legend_elements.append(
mlines.Line2D([], [], marker="*", ms=20, lw=0, label=fname, **_mkws)
)
# increment zorder up and markersize down for stacked client symbology
zorder += 1
if plot_clis:
markersize -= markersize_factor / p
if not multi_plot:
# legend
kws = dict(loc="upper left", bbox_to_anchor=(1.05, 0.7))
plt.legend(handles=legend_elements, **kws)
Modeling as Location Set Covering Problem (LSCP)¶
lscp = LSCP.from_cost_matrix(cost_matrix, SERVICE_RADIUS)
solver = pulp.PULP_CBC_CMD(msg=1)
#lscp = lscp.solve(pulp.GLPK(msg=False))
lscp = lscp.solve(solver)
lscp_objval = lscp.problem.objective.value()
facility_points["dv"] = lscp.fac_vars
facility_points["dv"] = facility_points["dv"].map(lambda x: x.name.replace("_", ""))
facility_points["predefined_loc"] = 0
plot_results(lscp, lscp_objval, facility_points, clis=demand_points)
Modeling as Maximum Covering Location Problem (MCLP)¶
mclp = MCLP.from_cost_matrix(
cost_matrix, ai, service_radius=SERVICE_RADIUS, p_facilities=P_FACILITIES,
)
mclp = mclp.solve(solver)
mclp.problem.objective.value()
955113.0
plot_results(mclp, P_FACILITIES, facility_points, clis=demand_points)
Modeling as P-Median Problem¶
pmedian = PMedian.from_cost_matrix(cost_matrix, ai, p_facilities=P_FACILITIES)
pmedian = pmedian.solve(solver)
pmedian.problem.objective.value()
2054687610.6381974
plot_results(pmedian, P_FACILITIES, facility_points, clis=demand_points)
Modeling as P-Center Problem¶
pcenter = PCenter.from_cost_matrix(cost_matrix, p_facilities=P_FACILITIES)
pcenter = pcenter.solve(solver)
pcenter.problem.objective.value()
4644.8457
plot_results(pcenter, P_FACILITIES, facility_points, clis=demand_points)
Modeling as P-Dispersion¶
pdispersion = PDispersion.from_cost_matrix(cost_matrix, p_facilities=P_FACILITIES)
pdispersion = pdispersion.solve(pulp.PULP_CBC_CMD(msg=False))
pdispersion.problem.objective.value()
8651.6146
plot_results(pdispersion, P_FACILITIES, facility_points)
fig, axarr = plt.subplots(1, 4, figsize=(20, 10))
fig.subplots_adjust(wspace=-0.01)
for i, m in enumerate([lscp, mclp, pmedian, pcenter]):
_p = m.problem.objective.value() if m.name == "lscp" else P_FACILITIES
plot_results(m, _p, facility_points, clis=demand_points, ax=axarr[i])
Comparing Solvers¶
For LSCP Model¶
lscp = LSCP.from_cost_matrix(cost_matrix, SERVICE_RADIUS)
#solver = pulp.PULP_CBC_CMD(msg=1)
#lscp = lscp.solve(pulp.GLPK(msg=False))
#lscp = lscp.solve(solver)
solvers=['PULP_CBC_CMD', 'HiGHS']
results_lscp = pandas.DataFrame(columns=["Solve Time (sec.)"], index=solvers)
for solver in solvers:
lscp_res = lscp.solve(pulp.getSolver(solver))
results_lscp.loc[solver] = [lscp_res.problem.solutionTime]
results_lscp
Solve Time (sec.) | |
---|---|
PULP_CBC_CMD | 0.032343 |
HiGHS | 0.008557 |
For MCLP model¶
mclp = MCLP.from_cost_matrix(
cost_matrix, ai, service_radius=SERVICE_RADIUS, p_facilities=P_FACILITIES,
)
solvers=['PULP_CBC_CMD', 'HiGHS']
results_mclp = pandas.DataFrame(columns=["Coverage %", "Solve Time (sec.)"], index=solvers)
for solver in solvers:
mclp_res = mclp.solve(pulp.getSolver(solver))
results_mclp.loc[solver] = [mclp_res.perc_cov, mclp_res.problem.solutionTime]
results_mclp
Coverage % | Solve Time (sec.) | |
---|---|---|
PULP_CBC_CMD | 100.0 | 0.032719 |
HiGHS | 100.0 | 0.00394 |
For P-Median model¶
pmedian = PMedian.from_cost_matrix(cost_matrix, ai, p_facilities=P_FACILITIES)
solvers=['PULP_CBC_CMD', 'HiGHS']
results_pmedian = pandas.DataFrame(columns=["Solve Time (sec.)"], index=solvers)
for solver in solvers:
pmedian_res = pmedian.solve(pulp.getSolver(solver))
results_pmedian.loc[solver] = [pmedian_res.problem.solutionTime]
results_pmedian
Solve Time (sec.) | |
---|---|
PULP_CBC_CMD | 0.15241 |
HiGHS | 0.087179 |
For P-Center model¶
pcenter = PCenter.from_cost_matrix(cost_matrix, p_facilities=P_FACILITIES)
solvers=['PULP_CBC_CMD', 'HiGHS']
results_pcenter = pandas.DataFrame(columns=["Solve Time (sec.)"], index=solvers)
for solver in solvers:
pcenter_res = pcenter.solve(pulp.getSolver(solver))
results_pcenter.loc[solver] = [pcenter_res.problem.solutionTime]
results_pcenter
Solve Time (sec.) | |
---|---|
PULP_CBC_CMD | 0.428298 |
HiGHS | 1.352853 |
For P-Dispersion model¶
pdispersion = PDispersion.from_cost_matrix(cost_matrix, p_facilities=P_FACILITIES)
solvers=['PULP_CBC_CMD', 'HiGHS']
results_pdispersion = pandas.DataFrame(columns=["Solve Time (sec.)"], index=solvers)
for solver in solvers:
pdispersion_res = pdispersion.solve(pulp.getSolver(solver))
results_pdispersion.loc[solver] = [pdispersion_res.problem.solutionTime]
results_pdispersion
Solve Time (sec.) | |
---|---|
PULP_CBC_CMD | 0.076274 |
HiGHS | 0.016829 |