Model sometimes works, sometimes hangs indefinitely
OngoingHi all,
I have been using Gurobi (both in the cloud and, when that seems to fail, on my local machine) to perform optimizations for months. For the last three weeks, I have been running a high volume of optimizations in the cloud (eg, I have performed several hundred runs successfully and saved the outputs). These optimizations generally take 6-7 minutes to complete, though some take 20-25 for no apparent reason. However, sometimes, a version of my model with a very similar formulation to all the others that have run successfully will hang indefinitely--the optimizer will seem to start (ie, the line in my code just before the .solve prints 'optimizer starting', and the cloud machine turns green and launches), and no error is thrown, but no result is returned. It will hang in this state for >5 hours if I don't catch it.
This is expensive, but (more importantly) baffling--I can't figure out why certain runs won't complete. To me it seems totally arbitrary which runs cause this hangup; I'm not yet sure what's similar between them / different from the runs that do successfully complete. Does anyone have suggestions for troubleshooting this?
Thank you,
Margaret
-
Do you have a Gurobi log for one of the runs that doesn't complete?
0 -
I don't, unfortunately. It doesn't seem to have been saving log files. I'm coding in Python and have been saving .ilp files, but not .log files. Can you tell me if the following looks like the right way to save a log file?
solver_parameters = "ResultFile=model.ilp, LogFile=model_log.log"results = opt.solve(model, options_string=solver_parameters, symbolic_solver_labels=True)If that should work, at what point in a non-completing run would the log file be generated? I've started a run just now but have not seen a log file appear in my working directory.0 -
Eli,
Looks like I've saved a log file. It's not clear that this run will *never* complete, but it's been going for almost an hour and a half, whereas the previous runs that completed in this session took 5-10 mines. Here's the head; please let me know if you need more information.
Gurobi 9.0.1 (mac64) logging started Wed Nov 18 05:13:18 2020
Changed value of parameter LogFile to model_log.log
Prev: Default:
Gurobi Optimizer version 9.0.1 build v9.0.1rc0 (mac64)
Optimize a model with 98554 rows, 24625 columns and 10736065 nonzeros
Model fingerprint: 0xa341d6d1
Model has 2114415 quadratic objective terms
Variable types: 1 continuous, 24624 integer (0 binary)
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [3e-11, 1e+09]
QObjective range [2e+00, 8e+00]
Bounds range [0e+00, 0e+00]
RHS range [1e+00, 1e+06]
Warning: Model contains large objective coefficients
Consider reformulating model or setting NumericFocus parameter
to avoid numerical issues.
Presolve removed 73786 rows and 6697 columns (presolve time = 8s) ...
Presolve removed 73955 rows and 6697 columns (presolve time = 10s) ...
Presolve removed 75351 rows and 6697 columns (presolve time = 15s) ...
Presolve removed 75375 rows and 6646 columns
Presolve time: 18.87s
Presolved: 23179 rows, 17979 columns, 4558096 nonzeros
Presolved model has 1145648 quadratic objective terms
Variable types: 0 continuous, 17979 integer (0 binary)
Found heuristic solution: objective 5.188266e+12
Root simplex log...
Iteration Objective Primal Inf. Dual Inf. Time
0 2.7172068e+10 0.000000e+00 6.130384e+07 21s
7731 6.9597478e+06 0.000000e+00 1.206543e+04 25s
14727 -3.9400031e+05 0.000000e+00 3.662503e+02 30s
24701 -1.6692974e+01 2.499054e+09 0.000000e+00 35s
33853 -6.5574627e+02 1.697637e+09 0.000000e+00 40s
42493 -6.7434083e+02 1.138674e+09 0.000000e+00 45s
51183 -6.7160092e+02 8.935296e+08 0.000000e+00 50s
Warning: 1 variables dropped from basis
62668 8.6028767e-01 8.708114e+08 0.000000e+00 55s
72410 -4.2136698e+02 2.849482e+08 0.000000e+00 60s
82202 -1.5684586e+03 6.906356e+08 0.000000e+00 65s
Warning: 1 variables dropped from basis
91520 -1.0398743e+03 1.386282e+09 0.000000e+00 70s
Warning: 4 variables dropped from basis0 -
For what it's worth, the log file of the forever-hanging run looks similar (to my eye) to the log file of a run that completed in 10 minutes:
Gurobi 9.0.1 (mac64) logging started Tue Nov 17 22:08:52 2020
Changed value of parameter LogFile to model_log.log
Prev: Default:
Gurobi Optimizer version 9.0.1 build v9.0.1rc0 (mac64)
Optimize a model with 69181 rows, 17281 columns and 5045761 nonzeros
Model fingerprint: 0x6d78f2ea
Model has 1560240 quadratic objective terms
Variable types: 1 continuous, 17280 integer (0 binary)
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [2e-13, 1e+09]
QObjective range [2e+00, 8e+00]
Bounds range [0e+00, 0e+00]
RHS range [1e+00, 7e+05]
Warning: Model contains large objective coefficients
Consider reformulating model or setting NumericFocus parameter
to avoid numerical issues.
Presolve removed 49485 rows and 3651 columns (presolve time = 5s) ...
Presolve removed 50099 rows and 3593 columns
Presolve time: 9.91s
Presolved: 19082 rows, 13688 columns, 2518175 nonzeros
Presolved model has 983254 quadratic objective terms
Variable types: 0 continuous, 13688 integer (0 binary)
Found heuristic solution: objective 1.277261e+13
Root simplex log...
Iteration Objective Primal Inf. Dual Inf. Time
0 3.3086991e+10 0.000000e+00 6.642891e+07 11s
9595 6.1536134e+07 0.000000e+00 2.757745e+04 15s
23130 6.6995621e-04 2.782785e+09 0.000000e+00 20s
36022 -1.2855606e+03 8.941475e+08 0.000000e+00 25s
48326 -1.2955765e+03 6.308836e+08 0.000000e+00 30s
60050 -1.3420210e+03 6.067081e+08 0.000000e+00 35s
Warning: 1 variables dropped from basis
73902 3.3387456e+00 9.911898e+08 0.000000e+00 40s
86910 -1.3406267e+01 2.250933e+09 0.000000e+00 45s
100862 -7.6920727e+02 2.880118e+09 0.000000e+00 50s
Warning: 1 variables dropped from basis
113222 -1.9792591e+01 1.951922e+09 0.000000e+00 55s
124891 -2.3535904e+02 8.992859e+08 0.000000e+00 60s0 -
Do you have a link to the full log files? These logs only show the first minute or so of solve time. Also, have you tried Gurobi 9.1?
The objective coefficient range is pretty suspicious:
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [3e-11, 1e+09]
QObjective range [2e+00, 8e+00]
Bounds range [0e+00, 0e+00]
RHS range [1e+00, 1e+06]
Warning: Model contains large objective coefficients
Consider reformulating model or setting NumericFocus parameter
to avoid numerical issues.Large objective coefficients like \( 10^9 \) can make it difficult for Gurobi to determine if a solution is truly optimal. Additionally, I'm curious where the very small objective coefficients like \( 3 \cdot 10^{-11} \) come from. If any of the objective coefficients represent penalty terms, perhaps a hierarchical multi-objective approach would be more appropriate.
It is best to reformulate the problem yourself to remove these very large and very small objective coefficients. You could alternatively try using the ObjScale parameter to scale the objective. I can't say for certain if rescaling the objective function will help, but it's the first place I would look.
Other parameters to try are PreQLinearize and NumericFocus.
Note there's no guarantee that if Gurobi solves a model in five minutes, it will solve a similar problem in five minutes (see Is Gurobi Optimizer deterministic?). And the models are pretty large - they have 5-10 million nonzeros and 1-2 million quadratic objective terms.
0 -
Thanks, Eli. This is a really helpful answer. For context, I have fairly limited optimization experience--I'm running a model I've inherited from someone else. Whoever takes over my job in a few months will likely have more optimization experience and will be able to implement some of your suggestions. In the meantime, do you have any stopgap measures you could recommend to let me wrap up these last few optimizations without making major changes to the code? ie, adjusting my cloud pool to have more or more powerful machines? (I'm currently running on a single c5.9xlarge machine.)
And I don't think I've tried Gurobi 9.1--would that help if I'm running on Gurobi Cloud? And here's a link to the log file--the last run logged there should be the one that failed.
0 -
The failed run gets stuck in the root relaxation solve:
Root simplex log...
Iteration Objective Primal Inf. Dual Inf. Time
0 1.2863960e+11 0.000000e+00 1.765539e+08 138s
2299 1.3210152e+10 0.000000e+00 8.395725e+06 140s
5989 5.2098164e+09 0.000000e+00 2.544591e+06 145s
9132 2.1776792e+09 0.000000e+00 1.106070e+06 150s
12131 8.8095846e+08 0.000000e+00 4.704857e+05 155s
...
1894891 4.1334679e+08 1.716058e+08 0.000000e+00 12261s
1895489 4.1334657e+08 1.328328e+08 0.000000e+00 12266s
1896068 4.1334160e+08 1.004277e+08 0.000000e+00 12271s
1896676 4.1334140e+08 1.814993e+08 0.000000e+00 12276s
1897224 4.1334134e+08 2.409831e+08 0.000000e+00 12281s
Warning: 1 variables dropped from basisI don't think a more powerful machine would help, since the problem is the model itself. To avoid long runs, you could set the TimeLimit parameter. But the solution returned by Gurobi could be quite bad. From the above problem:
Found heuristic solution: objective 4.469094e+13
I would try testing different values of ObjScale, PreQLinearize, and NumericFocus on the problematic model. AggFill=0 or Aggregate=0 might also help.
To use Gurobi 9.1 on the Cloud, you only have to update the Gurobi installation on your local machine. When you optimize, Gurobi Cloud recognizes which version of Gurobi you used to build the model and uses that version to solve the model. There's a chance Gurobi 9.1 performs a better on these models.
If you switch to Gurobi 9.1, you could try the new NoRelHeurTime parameter. This controls a heuristic that runs before the root relaxation is even solved. If the heuristic works well and Gurobi hits a time limit while solving the root relaxation, you could get a decent solution to the problem. The tradeoff is (i) the heuristic might not always work on your problems, and (ii) models that don't get stuck in the root relaxation might solve faster without spending time in this heuristic.
0
Please sign in to leave a comment.
Comments
7 comments