Parallelization of distinct subproblems in Column Generation
AnsweredHi Gurobi Team,
I'm implementing a column generation algorithm where, in each iteration, I solve one independent subproblem per patient (or entity) to generate new columns. Since these subproblems only read the current dual variables from the master problem and do not interact with each other, I’d like to solve them in parallel to speed up the process.
My code is written in Python, and I am running the code both on Windows and macOS. How could I achieve such a parallelization? Also, are there any known pitfalls or best practices for parallelizing subproblems in a column generation framework with Gurobi?
-
Hi Lorenz,
First, you may find our “How do I use multiprocessing in Python with Gurobi?” article helpful. It describes how the
multiprocessingPython package can be used to implement process-based parallelism and highlights the importance of managing independent Gurobi Environments across these processes. This section from the Gurobi manual on “session boundaries” describes howModelandEnvobjects should be disposed of and the usefulness of Python context managers for automatic cleanup.With this knowledge in mind, a Pythonic column generation pseudoalgorithm that solves parallel subproblems at each iteration might resemble something similar to the following:
import multiprocessing as mp import gurobipy as gp from gurobipy import GRB def solve_subproblem(patient_data, dual_values): """ Solve subproblem for one patient and return new column. """ with gp.Env() as env, gp.Model(env=env) as model: model.setParam('Threads', 1) # Limit threads per subproblem model.setParam('OutputFlag', 0) # Add subproblem variables, constraints, and objective # ... model.optimize() if model.Status == gp.GRB.OPTIMAL: return extract_column(model) # Define this elsewhere return None def solve_with_column_generation(...): master_env = gp.Env() master = gp.Model(env=master_env) master.setParam('OutputFlag', 0) # Add master problem variables, constraints, and objective # ... converged = False while not converged: master.optimize() dual_values = master.getAttr(GRB.Attr.Pi, model.getConstrs()) # Solve subproblems in parallel with mp.get_context('spawn').Pool() as pool: args = [(data, dual_values) for data in patient_data_list] new_columns = pool.starmap(solve_subproblem, args) # Filter valid columns and check convergence new_columns = [col for col in new_columns if col is not None] if not new_columns: converged = True # Add columns to master model for next iteration, # e.g., using `master.addVar(..., column=...)` # ... # Dispose of the master model and its environment master.dispose() master_env.dispose()There are a few things I want to highlight about this implementation:
- New columns can be iteratively added to the master model using the
addVarmethod, which optionally takes aColumnobject as an argument. (You have a serial version of your algorithm, so you may already know this.) -
Pool()will parallelize subproblems across all processors identified on your machine. If you want to parallelize across four workers instead, for example, usePool(processes=4). - I used
model.setParam('Threads', 1)for each subproblem. This is a good choice if the subproblems are small and easy to solve. Otherwise, tune the number ofPoolprocesses and the number of threads used to solve each subproblem. - Ensure your Gurobi license supports the number of parallel solves you're running. Each concurrent process solving a model counts toward your license limits.
- Be mindful of the data you pass to each subprocess. Avoid passing large, complex objects.
As an aside, I separately posed your question to Gurobot, which produced a good response. Feel free to try it for questions like this in the future.
I hope these tips help! Please let me know if you have additional questions.
1 - New columns can be iteratively added to the master model using the
Please sign in to leave a comment.
Comments
1 comment