Solver gets stuck in "Root Simplex" after finding optimal solution
AnsweredHi community,
I have a large MILP model in Pyomo on Python and I am running into problems for a certain set of input parameters. After finding an optimal objective after the barrier solving in 65 iterations and completing the root crossover it gets stuck int the root simplex...
See the three screenshots of the log attached.
On the last screenshot it can be seen that it does twice exactly the same thing at iteration 445562 with the same result and then after this nothing happens anymore.
Thanks a lot for your help and all the best
Axel
-
Hi Axel,
Could you share the LP file as described in Posting to the Community Forum. This way, the Community can try to reproduce this issue.
Best regards,
Jaromił0 -
Hi Jaromil,
Thanks for the answer. Unfortunately, I cannot because it is more than 1000 lines of code and depending on input files.
Also, for some certain input scenarios it works. Can it be due to the large matrix coefficient range? I am currently trying to resolve that issue.
Thanks
Axel
0 -
Hi Axel,
The coefficient range can definitely play a role as it is extremely large. You could have a look at our Guidelines for Numerical Issues for tips on how to scale your model properly.
You could also try tracking the memory usage just to make sure that you don't run into an out-of-memory condition.
Best regards,
Jaromił0 -
I am working on this right now thanks.
It is definitely not a memory issue. Do you have any idea what could cause this. I find it weird as according to the gurobi documentation the fact that Primal Inf and Dual Inf is zero means that the model is successfully solved. So actually it would only miss the final output but it does not show up...
0 -
Hi Axel,
It is very hard to tell from the log only. It seems very strange though. Numerical issues can cause almost any kind of behavior so I would not be surprised if a better scaling actually solves this issue.
What happens if you set the DegenMoves parameter to 0? This parameter turns off integrality improvement of the root relaxation solution. Maybe this is the issue here.
Best regards,
Jaromił0 -
Thanks Jaromil,
Yes I can imagine that it is hardly to tell just from the log file ;)
It is not the numeric issue. I resolved them (it was a huge big M number).
I will try the suggestion with the DegenMoves and let you know. Thanks!
0 -
Another thing that is quite weird:
This is another example but why does it do iteration 965072 twice? It should be done after the first time as far as I understand.
Same happened in the example above...
0 -
Hi Axel,
If this does not help, you could consider using the write() function to generate an MPS file and share one problematic instance.
Since you are using Pyomo, you could also have a look at How do I write an MPS model file from a third-party API?
Best regards,
Jaromił0 -
I ran out of memory when trying the DegenMove == 0.
Maybe in the end this is the problem but I do not understand why... and also why I do not get a memory error.
Before I had two quadratic constraints that I linearized and I hoped that this would solve the memory issues
I will try writing a MPS file
0 -
The double printing of the last line in the simplex log may be some syncing issue so I would not bother much.
I ran out of memory when trying the DegenMove == 0.
Maybe in the end this is the problem but I do not understand why... and also why I do not get a memory error.
Depending on the OS settings, you don't run into an error but your OS may start memory swapping. This then results in extremely slow progress such that you can get the feeling of "getting stuck". Could you run the model without DegenMoves=0 and track your memory via the task manager?
With over 7 million nonzeros in the presolved problem, your model definitely one of the larger ones so an out-of-memory condition definitely makes sense. You could try setting the number of Threads to 1 to limit memory usage. You can find more info in How do I avoid an out-of-memory condition?
Best regards,
Jaromił0 -
Thanks Jaromil,
I am always tracking my memory simultaneously. It is at around 85-95% most of the time until it hits the root simplex log where I have the feeling of getting stuck. Then it slowly drops. I also left it overnight and nothing happens. No memory error, nothing.
I tried all the out-of-memory suggestions but yeah as I said I have the feeling it is something else...
This is the memory after arriving at the stuck point and waiting for a couple fo minutes. Python still seems to do something in parallel according to the CPU usage. But then I do not understand what since the solver log seems to be done.
0 -
Hi Axel,
These might be Gurobi's parallel Threads running, that's why you have more than 1 Python process with large CPU and memory demands.
Does this behavior only occur for one model of your data set? Are you able to reproduce this behavior when running the same model written to an MPS file using the gurobi command line tool?
In case, that you cannot share the model file due to data security, please let me know and I will open a Support Ticket for you to share the model file.
Best regards,
Jaromił0 -
Hi Jaromil,
I couldn't try the MPS version yet, sorry.
The problem occurs for some inputs and for some not. But it is always the same inputs where it occurs. So yes it is reproducible. I have the feeling for inputs that are slightly more challenging (by far not infeasible) it starts doing this. I also tried it on two different machiens (16 vs 8 GB RAM) and both showed the exact same behaviour and the exact same optimal objective. For me this means that the problem is solvable.
Example of not functioning --> getting stuck here
Example of functioning:
0 -
Regarding the code, this is the part that makes the issues: It is a heat pump that can run either in cooling or heating mode but not at the same time both. So I need binary variable(s). If I say there is no cooling demand (so the heat pump works in heating only and I do not need binary variables I do not have this problem). All lower case parameter are decision variables for each time step, except capacity with is not indexed. p is the electric power consumption, h is the heating output, c is the cooling output, b_h is a binary variable that is 1 during heating and 0 during cooling, capacity is the maximal heating capacity available. M is a big value (bigger than capacity could possibly be) and COP is the efficiency of the heat pump.
Before the model had quadratic constraints which I did not see and therefore I often ran into memory issues. However, it worked and I never observed the behaviour described. Did I do something wrong converting the MIQP to a MILP model?
MILP:
p = h/COPh + c/COPc
h <= capacity
c <= capacity * (COPc/(COCc+1))
h <= M * b_h + 0 * (1 - b_h)
c <= 0 * b_h + M * (1 - b_h)
MIQP:
p = h * b_h/COPh + c * b_c/COPc
h <= capacity
c <= capacity * (COPc/(COCc+1))
b_h + b_c <= 10 -
Hey Jaromił,
to pick up the thread here, I observe exactly the same behavior: The root simplex gets stuck after the crossover after a barrier solve.
Set parameter BarHomogeneous to value 1
Set parameter Presolve to value 2
Set parameter Threads to value 46
Set parameter BarConvTol to value 1e-10
Set parameter ScaleFlag to value 2
Gurobi Optimizer version 9.5.2 build v9.5.2rc0 (win64)
Thread count: 24 physical cores, 48 logical processors, using up to 46 threads
Optimize a model with 14034346 rows, 11277165 columns and 37822849 nonzeros
Model fingerprint: 0x398518b9
Variable types: 11263485 continuous, 13680 integer (13680 binary)
Coefficient statistics:
Matrix range [1e-04, 4e+02]
Objective range [1e+00, 1e+00]
Bounds range [1e+00, 1e+00]
RHS range [5e-05, 2e+05]
Presolve removed 6661620 rows and 5913204 columns (presolve time = 5s) ...
Presolve removed 12283329 rows and 9175563 columns (presolve time = 11s) ...
Presolve removed 12530368 rows and 9544990 columns (presolve time = 15s) ...
Presolve removed 12530556 rows and 9553811 columns (presolve time = 20s) ...
Presolve removed 12530556 rows and 9605612 columns
Presolve time: 23.95s
Presolved: 1503790 rows, 1671553 columns, 6014614 nonzeros
Variable types: 1671553 continuous, 0 integer (0 binary)
Deterministic concurrent LP optimizer: primal simplex, dual simplex, and barrier
Showing barrier log only...
Root relaxation presolve removed 0 rows and 0 columns (presolve time = 5s) ...
Root relaxation presolved: 1503790 rows, 1671553 columns, 6014614 nonzeros
Root barrier log...
Elapsed ordering time = 15s
Elapsed ordering time = 20s
Ordering time: 67.88s
Elapsed ordering time = 68s
Elapsed ordering time = 70s
Elapsed ordering time = 78s
Elapsed ordering time = 81s
Elapsed ordering time = 86s
Elapsed ordering time = 90s
Elapsed ordering time = 95s
Elapsed ordering time = 100s
Elapsed ordering time = 105s
Elapsed ordering time = 110s
Ordering time: 118.48s
Barrier statistics:
Dense cols : 1869
AA' NZ : 4.411e+07
Factor NZ : 9.331e+08 (roughly 9.0 GB of memory)
Factor Ops : 2.999e+12 (roughly 5 seconds per iteration)
Threads : 44
Objective Residual
Iter Primal Dual Primal Dual Compl Time
0 -1.76457355e+12 -4.81475228e+13 2.05e+10 5.74e+01 3.78e+09 173s
1 -1.32230738e+12 -4.83695600e+13 1.73e+10 1.31e+03 3.16e+09 178s
2 -8.72868704e+11 -4.84380510e+13 1.32e+10 7.43e+02 2.37e+09 183s
3 -5.30988895e+11 -4.94625874e+13 8.72e+09 2.62e+02 1.61e+09 188s
4 -5.80100146e+10 -3.69274261e+12 8.26e+08 2.02e+01 1.79e+07 194s
5 -1.66100902e+10 -1.19694881e+12 2.02e+08 7.54e+00 2.34e+06 200s
6 -1.00545343e+10 -5.13877819e+11 1.22e+08 2.91e+00 9.48e+05 206s
7 -6.31994078e+09 -2.63781108e+11 7.49e+07 1.47e+00 4.37e+05 211s
8 -3.58323317e+09 -1.46278501e+11 4.30e+07 7.70e-01 2.02e+05 215s
9 -2.22651740e+09 -8.49240385e+10 2.89e+07 6.03e-01 1.12e+05 220s
10 -1.28718951e+09 -5.36329257e+10 1.87e+07 3.66e-01 6.31e+04 226s
...
269 5.05601584e+06 5.05601597e+06 1.88e-05 1.16e-05 2.53e-09 3501s
270 5.05601584e+06 5.05601597e+06 1.88e-05 1.16e-05 2.53e-09 3514s
271 5.05601584e+06 5.05601610e+06 1.88e-05 2.51e-05 2.59e-09 3526s
272 5.05601584e+06 5.05601609e+06 1.88e-05 2.35e-05 2.72e-09 3539s
273 5.05601584e+06 5.05601605e+06 1.88e-05 1.99e-05 2.78e-09 3552s
274 5.05601585e+06 5.05601588e+06 1.42e-05 1.58e-06 1.93e-09 3565s
275 5.05601585e+06 5.05601587e+06 1.38e-05 9.08e-07 1.88e-09 3578s
276 5.05601585e+06 5.05601586e+06 7.97e-06 3.49e-07 1.08e-09 3591s
277 5.05601585e+06 5.05601587e+06 7.34e-06 5.54e-07 9.89e-10 3605s
278 5.05601586e+06 5.05601586e+06 2.57e-07 1.73e-08 3.51e-11 3618s
279 5.05601586e+06 5.05601586e+06 3.96e-09 2.45e-10 9.54e-13 3631s
Barrier solved model in 279 iterations and 3631.48 seconds (4423.13 work units)
Optimal objective 5.05601586e+06
Root crossover log...
1252594 variables added to crossover basis 3635s
251106 DPushes remaining with DInf 1.7417488e-06 3639s
11931 DPushes remaining with DInf 0.0000000e+00 3790s
9115 DPushes remaining with DInf 0.0000000e+00 3799s
7603 DPushes remaining with DInf 0.0000000e+00 3804s
6566 DPushes remaining with DInf 0.0000000e+00 3809s
5737 DPushes remaining with DInf 0.0000000e+00 3826s
4849 DPushes remaining with DInf 0.0000000e+00 3833s
4132 DPushes remaining with DInf 0.0000000e+00 3841s
3516 DPushes remaining with DInf 0.0000000e+00 3853s
2910 DPushes remaining with DInf 0.0000000e+00 3872s
2307 DPushes remaining with DInf 0.0000000e+00 3896s
1704 DPushes remaining with DInf 0.0000000e+00 3917s
1101 DPushes remaining with DInf 0.0000000e+00 3940s
498 DPushes remaining with DInf 0.0000000e+00 3963s
0 DPushes remaining with DInf 0.0000000e+00 3987s
16276 PPushes remaining with PInf 0.0000000e+00 3987s
15289 PPushes remaining with PInf 0.0000000e+00 3997s
14378 PPushes remaining with PInf 0.0000000e+00 4001s
13770 PPushes remaining with PInf 0.0000000e+00 4006s
13167 PPushes remaining with PInf 0.0000000e+00 4011s
12559 PPushes remaining with PInf 0.0000000e+00 4016s
11215 PPushes remaining with PInf 0.0000000e+00 4023s
10607 PPushes remaining with PInf 0.0000000e+00 4027s
9836 PPushes remaining with PInf 0.0000000e+00 4031s
8475 PPushes remaining with PInf 0.0000000e+00 4037s
7857 PPushes remaining with PInf 0.0000000e+00 4041s
6510 PPushes remaining with PInf 0.0000000e+00 4047s
5640 PPushes remaining with PInf 0.0000000e+00 4050s
3707 PPushes remaining with PInf 0.0000000e+00 4055s
405 PPushes remaining with PInf 0.0000000e+00 4060s
0 PPushes remaining with PInf 0.0000000e+00 4062s
Push phase complete: Pinf 0.0000000e+00, Dinf 4.2525637e+00 4063s
Root simplex log...
Iteration Objective Primal Inf. Dual Inf. Time
96383 5.0560159e+06 0.000000e+00 4.252564e+00 4064s
96385 5.0560159e+06 0.000000e+00 0.000000e+00 4066s
96385 5.0560159e+06 0.000000e+00 0.000000e+00 4068s
[here, Gurobi was stuck for quite a while]
Concurrent spin time: 3980.12s (can be avoided by choosing Method=3)
Solved with barrier
Root relaxation: objective 5.056016e+06, 96385 iterations, 8016.25 seconds (5184.61 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
* 0 0 0 5056015.8599 5056015.86 0.00% - 8048s
Explored 1 nodes (96385 simplex iterations) in 8049.39 seconds (5228.09 work units)
Thread count was 46 (of 48 available processors)
Solution count 2: 5.05602e+06 5.05602e+06
Optimal solution found (tolerance 1.00e-04)
Best objective 5.056015859939e+06, best bound 5.056015859939e+06, gap 0.0000%I am painfully aware that my RHS ranges are not great, however, I still do not understand what triggers this behavior. While I wrote this comment, it actually continued after waiting 4000 seconds.
Do you know if I can expedite this process because it does not seem very efficient to idle around there?
Thanks so much in advance (for your always fantastic help!)
Best,Jacob
0 -
Hi Jacob,
Is there a particular reason why you set the parameters? BarHomogeneous will make Gurobi use the homogeneous Barrier algorithm which is expected to be way slower than the standard Barrier algorithm. Presolve=2 and ScaleFlag=2 might or might not have a positive impact. Reducing the BarConvTol parameter will also make the Barrier algorithm require more iterations.
You should update to the latest Gurobi version 10.
You could try avoiding Crossover completely for MIPs by setting Method=2 Crossover=0 and NodeMethod=2. Note that these setting make only sense if the B&B search requires only very few nodes. In your case, you need only the root node, thus this might be a suitable setting for your model.
The big waiting time after Crossover and before the B&B tree output is often caused by degenerate simplex moves. Setting DegenMoves=0 might help.
Best regards,
Jaromił0 -
Hi Jaromił,
thanks for your quick reply! The parameter settings grew historically, but I am always looking for a better configuration, so your help is always much appreciated!
I used BarHomogeneous before to find out if my problem is infeasible or unbounded, but I don't need it anymore. Since the problem is large with nasty numerics, I wanted to use Presolve and ScaleFlag to tame that a bit. And I have observed before that a tighter BarConvTol leads to shorter crossovers.
I have disabled the Crossover before, but if I remember correctly, it leads to "ugly" variable values. I am using some variable values as parameters in a subsequent optimization. Then, pre-Crossover variable values lead to bad numerics in the next optimization, doesn't it?
I will try the DegenMoves = 0, thanks!
Thanks for recommending the update! I will see if we can do that in our project.
Many thanks and best regards
Jacob0 -
Hi Jacob,
The parameter settings grew historically, but I am always looking for a better configuration, so your help is always much appreciated!
Understandable, but I can imagine that your model also grew and improved. Thus, maybe now is a good point to start with a fresh set of parameters.
I have disabled the Crossover before, but if I remember correctly, it leads to "ugly" variable values. I am using some variable values as parameters in a subsequent optimization. Then, pre-Crossover variable values lead to bad numerics in the next optimization, doesn't it?
This cannot be said in general. However, if the model is numerically difficult, then yes, Crossover often takes care of "un-clean" Barrier solutions. And yes, it is often best to use Crossover, when you want to re-use the solution point in a subsequent optimization. Still, it might be worth a try.
Best regards,
Jaromił0 -
Hi Jaromił,
thanks again for your continued help! I will try that!
Best wishes and many thanks
Jacob0
Please sign in to leave a comment.
Comments
19 comments