メインコンテンツへスキップ

Parallelization of distinct subproblems in Column Generation

回答済み

コメント

1件のコメント

  • Byron Tasseff
    • Gurobi Staff

    Hi Lorenz,

    First, you may find our “How do I use multiprocessing in Python with Gurobi?” article helpful. It describes how the multiprocessing Python package can be used to implement process-based parallelism and highlights the importance of managing independent Gurobi Environments across these processes. This section from the Gurobi manual on “session boundaries” describes how Model and Env objects should be disposed of and the usefulness of Python context managers for automatic cleanup.

    With this knowledge in mind, a Pythonic column generation pseudoalgorithm that solves parallel subproblems at each iteration might resemble something similar to the following:

    import multiprocessing as mp
    import gurobipy as gp
    from gurobipy import GRB
    
    
    def solve_subproblem(patient_data, dual_values):
        """
        Solve subproblem for one patient and return new column.
        """
       
        with gp.Env() as env, gp.Model(env=env) as model:
            model.setParam('Threads', 1)  # Limit threads per subproblem
            model.setParam('OutputFlag', 0)
            
            # Add subproblem variables, constraints, and objective
            # ...
            
            model.optimize()
            
            if model.Status == gp.GRB.OPTIMAL:
                return extract_column(model)  # Define this elsewhere
            return None
    
    def solve_with_column_generation(...):
        master_env = gp.Env()
        master = gp.Model(env=master_env)
        master.setParam('OutputFlag', 0)
    
        # Add master problem variables, constraints, and objective
        # ...
    
        converged = False
        
        while not converged:
            master.optimize()
            dual_values = master.getAttr(GRB.Attr.Pi, model.getConstrs())
            
            # Solve subproblems in parallel
            with mp.get_context('spawn').Pool() as pool:
                args = [(data, dual_values) for data in patient_data_list]
                new_columns = pool.starmap(solve_subproblem, args)
            
            # Filter valid columns and check convergence
            new_columns = [col for col in new_columns if col is not None]
            if not new_columns:
                converged = True
    
            # Add columns to master model for next iteration,
            # e.g., using `master.addVar(..., column=...)`
            # ...
    
        # Dispose of the master model and its environment
        master.dispose()
        master_env.dispose()

    There are a few things I want to highlight about this implementation:

    • New columns can be iteratively added to the master model using the addVar method, which optionally takes a Column object as an argument. (You have a serial version of your algorithm, so you may already know this.)
    • Pool() will parallelize subproblems across all processors identified on your machine. If you want to parallelize across four workers instead, for example, use Pool(processes=4).
    • I used model.setParam('Threads', 1) for each subproblem. This is a good choice if the subproblems are small and easy to solve. Otherwise, tune the number of Pool processes and the number of threads used to solve each subproblem.
    • Ensure your Gurobi license supports the number of parallel solves you're running. Each concurrent process solving a model counts toward your license limits.
    • Be mindful of the data you pass to each subprocess. Avoid passing large, complex objects.

    As an aside, I separately posed your question to Gurobot, which produced a good response. Feel free to try it for questions like this in the future.

    I hope these tips help! Please let me know if you have additional questions.

    1

サインインしてコメントを残してください。