Skip to main content

Same model, same parameters, but different solutions

Answered

Comments

4 comments

  • Maliheh Aramon
    • Gurobi Staff

    Hi Arthur, 

    The first inconsistency is that all runs except one ran out of memory. I suspect the concurrent LP optimization at the root node caused a memory spike, as the logs show slight differences in how the relaxation was handled right before crashing.

    Since all three runs show the same fingerprint for the model, we expect that running the model on the same machine with the same parameters leads to the same solution path. You have set both the MemLimit and SoftMemLimit parameters, with the latter set to a higher value. Setting the MemLimit=30 limits the total amount of memory available to Gurobi to 30 GB. This indicates that if more memory is required, Gurobi will fail with an out-of-memory error. The setting SoftMemLimit=64 is irrelevant here. Why did you set both MemLimit and SoftMemLimit parameters? 

    The ordering in the barrier algorithm appears to be memory-intensive, and it appears that the dual simplex managed to solve the root relaxation to optimality in one of the runs slightly before hitting the memory limit (note that the concurrent method for solving the LP relaxation is non-deterministic). To avoid the issue for this model, you need to either increase the amount of memory available to Gurobi or set the Method parameter to 1, forcing the Gurobi Optimizer to solve the root LP relaxation with the dual simplex. 

    The second inconsistency is that one specific run found a better solution for another instance. I think the other runs suffered from hardware delays or cluster load, as the "Node 0" log for the best run includes several lines completely missing from the slower ones. Since that successful run was executed on a different day, it probably got lucky and landed on a more free or cooler CPU core.

    The solution paths in all three runs are the same. The difference stems from the amount of time that it takes to explore the tree. For example, the gap 1.4% (see below) was reached at 501 seconds in the first run, but it took 1718 and 1741 seconds to reach the same point in the second and the third runs, respectively. Your conjecture is likely correct. For the second and third runs, the machine appears to be oversubscribed. Do you know how many Gurobi jobs were running concurrently on the same machine with the second and the third run?

    - First run

        0     0 53166.4853    0  150 53779.0000 53166.4853  1.14%     -  501s

    - Second run

         0     0 53166.4853    0  150 53779.0000 53166.4853  1.14%     - 1718s

    - Third run

         0     0 53166.4853    0  150 53779.0000 53166.4853  1.14%     - 1741s

     

    Best regards,

    Maliheh

    0
  • Arthur Cruz
    • First Comment
    • First Question

    Hello Maliheh,

    thank you for the answer.

    Why did you set both MemLimit and SoftMemLimit parameters? 

    MemLimit and SoftMemLimit are both input parameters in my program. For these experiments, I only needed MemLimit. However, rather than modifying the code to remove SoftMemLimit, I simply set its value higher than MemLimit to ensure it would not be triggered.

    To avoid the issue for this model, you need to either increase the amount of memory available to Gurobi or set the Method parameter to 1, forcing the Gurobi Optimizer to solve the root LP relaxation with the dual simplex. 

    I expected some instances to get out-of-memory errors, as previous experiments indicated that this could occur with larger datasets. I consider these to be valid results, as my goal is to map the model's limitations. The main problem was the inconsistency between runs.

    I looked up the Method parameter. Would setting it to 4 (deterministic concurrent) ensure or improve the chances that all runs get the same result (all of them solving the LP with simplex or all of them running out of memory)?

    Do you know how many Gurobi jobs were running concurrently on the same machine with the second and the third run?

    I do not know the number of jobs, but knowing that my assumption is probably correct will help me in my future experiments. Thank you!

    Best regards,

    Arthur.

    0
  • Maliheh Aramon
    • Gurobi Staff

    I looked up the Method parameter. Would setting it to 4 (deterministic concurrent) ensure or improve the chances that all runs get the same result (all of them solving the LP with simplex or all of them running out of memory)?

    Yes, the deterministic concurrent method should ensure the exact same results.

    I do not know the number of jobs, but knowing that my assumption is probably correct will help me in my future experiments.

    Yes, our best explanation for your results is that the machine was oversubscribed (you can also compare the work units value across all three runs). By oversubscription, I mean that the total number of threads over all running Gurobi processes was likely larger than the number of logical processors on the machine, where the operating system scheduler needed to periodically pause threads to give all threads their share of processor time, causing some or all of Gurobi's jobs to slow down. 

    Best regards,

    Maliheh

    0
  • Arthur Cruz
    • First Comment
    • First Question

    I see.

    I do not have any more questions about this issue.

    Thank you, Maliheh.

    Best regards,

    Arthur.

    0

Please sign in to leave a comment.