Skip to main content

Solver gets stuck in "Root Simplex" after finding optimal solution

Answered

Comments

19 comments

  • Jaromił Najman
    Gurobi Staff Gurobi Staff

    Hi Axel,

    Could you share the LP file as described in Posting to the Community Forum. This way, the Community can try to reproduce this issue.

    Best regards,
    Jaromił

    0
  • Axel Bruck
    Gurobi-versary
    Conversationalist
    First Question

    Hi Jaromil,

     

    Thanks for the answer. Unfortunately, I cannot because it is more than 1000 lines of code and depending on input files.

    Also, for some certain input scenarios it works. Can it be due to the large matrix coefficient range? I am currently trying to resolve that issue. 

     

    Thanks

     

    Axel

    0
  • Jaromił Najman
    Gurobi Staff Gurobi Staff

    Hi Axel,

    The coefficient range can definitely play a role as it is extremely large. You could have a look at our Guidelines for Numerical Issues for tips on how to scale your model properly.

    You could also try tracking the memory usage just to make sure that you don't run into an out-of-memory condition.

    Best regards,
    Jaromił

    0
  • Axel Bruck
    Gurobi-versary
    Conversationalist
    First Question

    I am working on this right now thanks.

    It is definitely not a memory issue. Do you have any idea what could cause this. I find it weird as according to the gurobi documentation the fact that Primal Inf and Dual Inf is zero means that the model is successfully solved. So actually it would only miss the final output but it does not show up...

    0
  • Jaromił Najman
    Gurobi Staff Gurobi Staff

    Hi Axel,

    It is very hard to tell from the log only. It seems very strange though. Numerical issues can cause almost any kind of behavior so I would not be surprised if a better scaling actually solves this issue.

    What happens if you set the DegenMoves parameter to 0? This parameter turns off integrality improvement of the root relaxation solution. Maybe this is the issue here.

    Best regards,
    Jaromił

    0
  • Axel Bruck
    Gurobi-versary
    Conversationalist
    First Question

    Thanks Jaromil,

    Yes I can imagine that it is hardly to tell just from the log file ;)

    It is not the numeric issue. I resolved them (it was a huge big M number). 

    I will try the suggestion with the DegenMoves and let you know. Thanks!

    0
  • Axel Bruck
    Gurobi-versary
    Conversationalist
    First Question

    Another thing that is quite weird:

    This is another example but why does it do iteration 965072 twice? It should be done after the first time as far as I understand.

    Same happened in the example above...

    0
  • Jaromił Najman
    Gurobi Staff Gurobi Staff

    Hi Axel,

    If this does not help, you could consider using the write() function to generate an MPS file and share one problematic instance.

    Since you are using Pyomo, you could also have a look at How do I write an MPS model file from a third-party API?

    Best regards,
    Jaromił

    0
  • Axel Bruck
    Gurobi-versary
    Conversationalist
    First Question

    I ran out of memory when trying the DegenMove == 0.

    Maybe in the end this is the problem but I do not understand why... and also why I do not get a memory error.

    Before I had two quadratic constraints that I linearized and I hoped that this would solve the memory issues

     

    I will try writing a MPS file

    0
  • Jaromił Najman
    Gurobi Staff Gurobi Staff

    The double printing of the last line in the simplex log may be some syncing issue so I would not bother much.

    I ran out of memory when trying the DegenMove == 0.

    Maybe in the end this is the problem but I do not understand why... and also why I do not get a memory error.

    Depending on the OS settings, you don't run into an error but your OS may start memory swapping. This then results in extremely slow progress such that you can get the feeling of "getting stuck". Could you run the model without DegenMoves=0 and track your memory via the task manager?

    With over 7 million nonzeros in the presolved problem, your model definitely one of the larger ones so an out-of-memory condition definitely makes sense. You could try setting the number of Threads to 1 to limit memory usage. You can find more info in How do I avoid an out-of-memory condition?

    Best regards,
    Jaromił

    0
  • Axel Bruck
    Gurobi-versary
    Conversationalist
    First Question

    Thanks Jaromil,

    I am always tracking my memory simultaneously. It is at around 85-95% most of the time until it hits the root simplex log where I have the feeling of getting stuck. Then it slowly drops. I also left it overnight and nothing happens. No memory error, nothing.

    I tried all the out-of-memory suggestions but yeah as I said I have the feeling it is something else...

    This is the memory after arriving at the stuck point and waiting for a couple fo minutes. Python still seems to do something in parallel according to the CPU usage. But then I do not understand what since the solver log seems to be done.

    0
  • Jaromił Najman
    Gurobi Staff Gurobi Staff

    Hi Axel,

    These might be Gurobi's parallel Threads running, that's why you have more than 1 Python process with large CPU and memory demands.

    Does this behavior only occur for one model of your data set? Are you able to reproduce this behavior when running the same model written to an MPS file using the gurobi command line tool?

    In case, that you cannot share the model file due to data security, please let me know and I will open a Support Ticket for you to share the model file.

    Best regards,
    Jaromił

    0
  • Axel Bruck
    Gurobi-versary
    Conversationalist
    First Question

    Hi Jaromil,

    I couldn't try the MPS version yet, sorry. 

    The problem occurs for some inputs and for some not. But it is always the same inputs where it occurs. So yes it is reproducible. I have the feeling for inputs that are slightly more challenging (by far not infeasible) it starts doing this. I also tried it on two different machiens (16 vs 8 GB RAM) and both showed the exact same behaviour and the exact same optimal objective. For me this means that the problem is solvable.

    Example of not functioning --> getting stuck here

    Example of functioning:

     

    0
  • Axel Bruck
    Gurobi-versary
    Conversationalist
    First Question

    Regarding the code, this is the part that makes the issues: It is a heat pump that can run either in cooling or heating mode but not at the same time both. So I need binary variable(s). If I say there is no cooling demand (so the heat pump works in heating only and I do not need binary variables I do not have this problem). All lower case parameter are decision variables for each time step, except capacity with is not indexed. p is the electric power consumption, h is the heating output, c is the cooling output, b_h is a binary variable that is 1 during heating and 0 during cooling, capacity is the maximal heating capacity available. M is a big value (bigger than capacity could possibly be) and COP is the efficiency of the heat pump. 

    Before the model had quadratic constraints which I did not see and therefore I often ran into memory issues. However, it worked and I never observed the behaviour described. Did I do something wrong converting the MIQP to a MILP model?

    MILP:
    p = h/COPh + c/COPc
    h <= capacity
    c <= capacity * (COPc/(COCc+1))
    h <= M * b_h + 0 * (1 - b_h)
    c <= 0 * b_h + M * (1 - b_h)

    MIQP:
    p = h * b_h/COPh + c * b_c/COPc
    h <= capacity
    c <= capacity * (COPc/(COCc+1))
    b_h + b_c <= 1
    0
  • Jacob Mannhardt
    Gurobi-versary
    Conversationalist
    First Question

    Hey Jaromił, 

    to pick up the thread here, I observe exactly the same behavior: The root simplex gets stuck after the crossover after a barrier solve.

     

    Set parameter BarHomogeneous to value 1
    Set parameter Presolve to value 2
    Set parameter Threads to value 46
    Set parameter BarConvTol to value 1e-10
    Set parameter ScaleFlag to value 2
    Gurobi Optimizer version 9.5.2 build v9.5.2rc0 (win64)
    Thread count: 24 physical cores, 48 logical processors, using up to 46 threads
    Optimize a model with 14034346 rows, 11277165 columns and 37822849 nonzeros
    Model fingerprint: 0x398518b9
    Variable types: 11263485 continuous, 13680 integer (13680 binary)
    Coefficient statistics:
      Matrix range     [1e-04, 4e+02]
      Objective range  [1e+00, 1e+00]
      Bounds range     [1e+00, 1e+00]
      RHS range        [5e-05, 2e+05]
    Presolve removed 6661620 rows and 5913204 columns (presolve time = 5s) ...
    Presolve removed 12283329 rows and 9175563 columns (presolve time = 11s) ...
    Presolve removed 12530368 rows and 9544990 columns (presolve time = 15s) ...
    Presolve removed 12530556 rows and 9553811 columns (presolve time = 20s) ...
    Presolve removed 12530556 rows and 9605612 columns
    Presolve time: 23.95s
    Presolved: 1503790 rows, 1671553 columns, 6014614 nonzeros
    Variable types: 1671553 continuous, 0 integer (0 binary)

    Deterministic concurrent LP optimizer: primal simplex, dual simplex, and barrier
    Showing barrier log only...

    Root relaxation presolve removed 0 rows and 0 columns (presolve time = 5s) ...
    Root relaxation presolved: 1503790 rows, 1671553 columns, 6014614 nonzeros

    Root barrier log...

    Elapsed ordering time = 15s
    Elapsed ordering time = 20s
    Ordering time: 67.88s
    Elapsed ordering time = 68s
    Elapsed ordering time = 70s
    Elapsed ordering time = 78s
    Elapsed ordering time = 81s
    Elapsed ordering time = 86s
    Elapsed ordering time = 90s
    Elapsed ordering time = 95s
    Elapsed ordering time = 100s
    Elapsed ordering time = 105s
    Elapsed ordering time = 110s
    Ordering time: 118.48s

    Barrier statistics:
     Dense cols : 1869
     AA' NZ     : 4.411e+07
     Factor NZ  : 9.331e+08 (roughly 9.0 GB of memory)
     Factor Ops : 2.999e+12 (roughly 5 seconds per iteration)
     Threads    : 44

                      Objective                Residual
    Iter       Primal          Dual         Primal    Dual     Compl     Time
       0  -1.76457355e+12 -4.81475228e+13  2.05e+10 5.74e+01  3.78e+09   173s
       1  -1.32230738e+12 -4.83695600e+13  1.73e+10 1.31e+03  3.16e+09   178s
       2  -8.72868704e+11 -4.84380510e+13  1.32e+10 7.43e+02  2.37e+09   183s
       3  -5.30988895e+11 -4.94625874e+13  8.72e+09 2.62e+02  1.61e+09   188s
       4  -5.80100146e+10 -3.69274261e+12  8.26e+08 2.02e+01  1.79e+07   194s
       5  -1.66100902e+10 -1.19694881e+12  2.02e+08 7.54e+00  2.34e+06   200s
       6  -1.00545343e+10 -5.13877819e+11  1.22e+08 2.91e+00  9.48e+05   206s
       7  -6.31994078e+09 -2.63781108e+11  7.49e+07 1.47e+00  4.37e+05   211s
       8  -3.58323317e+09 -1.46278501e+11  4.30e+07 7.70e-01  2.02e+05   215s
       9  -2.22651740e+09 -8.49240385e+10  2.89e+07 6.03e-01  1.12e+05   220s
      10  -1.28718951e+09 -5.36329257e+10  1.87e+07 3.66e-01  6.31e+04   226s
    ...
     269   5.05601584e+06  5.05601597e+06  1.88e-05 1.16e-05  2.53e-09  3501s
     270   5.05601584e+06  5.05601597e+06  1.88e-05 1.16e-05  2.53e-09  3514s
     271   5.05601584e+06  5.05601610e+06  1.88e-05 2.51e-05  2.59e-09  3526s
     272   5.05601584e+06  5.05601609e+06  1.88e-05 2.35e-05  2.72e-09  3539s
     273   5.05601584e+06  5.05601605e+06  1.88e-05 1.99e-05  2.78e-09  3552s
     274   5.05601585e+06  5.05601588e+06  1.42e-05 1.58e-06  1.93e-09  3565s
     275   5.05601585e+06  5.05601587e+06  1.38e-05 9.08e-07  1.88e-09  3578s
     276   5.05601585e+06  5.05601586e+06  7.97e-06 3.49e-07  1.08e-09  3591s
     277   5.05601585e+06  5.05601587e+06  7.34e-06 5.54e-07  9.89e-10  3605s
     278   5.05601586e+06  5.05601586e+06  2.57e-07 1.73e-08  3.51e-11  3618s
     279   5.05601586e+06  5.05601586e+06  3.96e-09 2.45e-10  9.54e-13  3631s

    Barrier solved model in 279 iterations and 3631.48 seconds (4423.13 work units)
    Optimal objective 5.05601586e+06


    Root crossover log...

     1252594 variables added to crossover basis                     3635s

      251106 DPushes remaining with DInf 1.7417488e-06              3639s
       11931 DPushes remaining with DInf 0.0000000e+00              3790s
        9115 DPushes remaining with DInf 0.0000000e+00              3799s
        7603 DPushes remaining with DInf 0.0000000e+00              3804s
        6566 DPushes remaining with DInf 0.0000000e+00              3809s
        5737 DPushes remaining with DInf 0.0000000e+00              3826s
        4849 DPushes remaining with DInf 0.0000000e+00              3833s
        4132 DPushes remaining with DInf 0.0000000e+00              3841s
        3516 DPushes remaining with DInf 0.0000000e+00              3853s
        2910 DPushes remaining with DInf 0.0000000e+00              3872s
        2307 DPushes remaining with DInf 0.0000000e+00              3896s
        1704 DPushes remaining with DInf 0.0000000e+00              3917s
        1101 DPushes remaining with DInf 0.0000000e+00              3940s
         498 DPushes remaining with DInf 0.0000000e+00              3963s
           0 DPushes remaining with DInf 0.0000000e+00              3987s

       16276 PPushes remaining with PInf 0.0000000e+00              3987s
       15289 PPushes remaining with PInf 0.0000000e+00              3997s
       14378 PPushes remaining with PInf 0.0000000e+00              4001s
       13770 PPushes remaining with PInf 0.0000000e+00              4006s
       13167 PPushes remaining with PInf 0.0000000e+00              4011s
       12559 PPushes remaining with PInf 0.0000000e+00              4016s
       11215 PPushes remaining with PInf 0.0000000e+00              4023s
       10607 PPushes remaining with PInf 0.0000000e+00              4027s
        9836 PPushes remaining with PInf 0.0000000e+00              4031s
        8475 PPushes remaining with PInf 0.0000000e+00              4037s
        7857 PPushes remaining with PInf 0.0000000e+00              4041s
        6510 PPushes remaining with PInf 0.0000000e+00              4047s
        5640 PPushes remaining with PInf 0.0000000e+00              4050s
        3707 PPushes remaining with PInf 0.0000000e+00              4055s
         405 PPushes remaining with PInf 0.0000000e+00              4060s
           0 PPushes remaining with PInf 0.0000000e+00              4062s

      Push phase complete: Pinf 0.0000000e+00, Dinf 4.2525637e+00   4063s


    Root simplex log...

    Iteration    Objective       Primal Inf.    Dual Inf.      Time
       96383    5.0560159e+06   0.000000e+00   4.252564e+00   4064s
       96385    5.0560159e+06   0.000000e+00   0.000000e+00   4066s
     96385    5.0560159e+06   0.000000e+00   0.000000e+00   4068s
    [here, Gurobi was stuck for quite a while]
    Concurrent spin time: 3980.12s (can be avoided by choosing Method=3)

    Solved with barrier

    Root relaxation: objective 5.056016e+06, 96385 iterations, 8016.25 seconds (5184.61 work units)

        Nodes    |    Current Node    |     Objective Bounds      |     Work
     Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

    *    0     0               0    5056015.8599 5056015.86  0.00%     - 8048s

    Explored 1 nodes (96385 simplex iterations) in 8049.39 seconds (5228.09 work units)
    Thread count was 46 (of 48 available processors)

    Solution count 2: 5.05602e+06 5.05602e+06

    Optimal solution found (tolerance 1.00e-04)
    Best objective 5.056015859939e+06, best bound 5.056015859939e+06, gap 0.0000%

    I am painfully aware that my RHS ranges are not great, however, I still do not understand what triggers this behavior. While I wrote this comment, it actually continued after waiting 4000 seconds.

    Do you know if I can expedite this process because it does not seem very efficient to idle around there?

    Thanks so much in advance (for your always fantastic help!)
    Best,

    Jacob

    0
  • Jaromił Najman
    Gurobi Staff Gurobi Staff

    Hi Jacob,

    Is there a particular reason why you set the parameters? BarHomogeneous will make Gurobi use the homogeneous Barrier algorithm which is expected to be way slower than the standard Barrier algorithm. Presolve=2 and ScaleFlag=2 might or might not have a positive impact. Reducing the BarConvTol parameter will also make the Barrier algorithm require more iterations.

    You should update to the latest Gurobi version 10.

    You could try avoiding Crossover completely for MIPs by setting Method=2 Crossover=0 and NodeMethod=2. Note that these setting make only sense if the B&B search requires only very few nodes. In your case, you need only the root node, thus this might be a suitable setting for your model.

    The big waiting time after Crossover and before the B&B tree output is often caused by degenerate simplex moves. Setting DegenMoves=0 might help.

    Best regards, 
    Jaromił

    0
  • Jacob Mannhardt
    Gurobi-versary
    Conversationalist
    First Question

    Hi Jaromił,

    thanks for your quick reply! The parameter settings grew historically, but I am always looking for a better configuration, so your help is always much appreciated! 

    I used BarHomogeneous before to find out if my problem is infeasible or unbounded, but I don't need it anymore. Since the problem is large with nasty numerics, I wanted to use Presolve and ScaleFlag to tame that a bit. And I have observed before that a tighter BarConvTol leads to shorter crossovers. 

    I have disabled the Crossover before, but if I remember correctly, it leads to "ugly" variable values. I am using some variable values as parameters in a subsequent optimization. Then, pre-Crossover variable values lead to bad numerics in the next optimization, doesn't it?

    I will try the DegenMoves = 0, thanks!

    Thanks for recommending the update! I will see if we can do that in our project.

    Many thanks and best regards
    Jacob

    0
  • Jaromił Najman
    Gurobi Staff Gurobi Staff

    Hi Jacob,

    The parameter settings grew historically, but I am always looking for a better configuration, so your help is always much appreciated! 

    Understandable, but I can imagine that your model also grew and improved. Thus, maybe now is a good point to start with a fresh set of parameters.

    I have disabled the Crossover before, but if I remember correctly, it leads to "ugly" variable values. I am using some variable values as parameters in a subsequent optimization. Then, pre-Crossover variable values lead to bad numerics in the next optimization, doesn't it?

    This cannot be said in general. However, if the model is numerically difficult, then yes, Crossover often takes care of "un-clean" Barrier solutions. And yes, it is often best to use Crossover, when you want to re-use the solution point in a subsequent optimization. Still, it might be worth a try.

    Best regards, 
    Jaromił

     

    0
  • Jacob Mannhardt
    Gurobi-versary
    Conversationalist
    First Question

    Hi Jaromił,

    thanks again for your continued help! I will try that!

    Best wishes and many thanks
    Jacob

    0

Please sign in to leave a comment.