Astronomically low node throughput; major degeneracy?
AnsweredHi,
I'm trying to diagnose a large MIP model that appears to hang. Unlike previous models that have seemed to hang because they get stuck trying cuts at the root node, this model seems to select a variable to branch on, but makes minimal progress after that.
Nodes  Current Node  Objective Bounds  Work
Expl Unexpl  Obj Depth IntInf  Incumbent BestBd Gap  It/Node Time
0 0 100.00000 0 3559  100.00000   500s
0 0 100.00000 0 3559  100.00000   504s
0 2 100.00000 0 3559  100.00000   510s
^C
Interrupt request received
1 3 100.00000 1 4051  100.00000  130810 24830s
In almost 7 hours, it explored only one node! It's strange that one node took 130k simplex iterations, but it's also doing about 7.5 simplex iterations/sec, which seems very slow as well. This is with version 9.1.1, but version 9.1.2 seems to exhibit the same behavior.
I also tried NodeMethod=2, and it is faster (one node in ~1500s, 556950 it/node), but still much higher than I was expecting. I have other similarly sized models that don't have this problem, so my theory is there's something strange about the model, perhaps lots of degeneracy, that's causing this. If so, is there anything I can do to debug it?
Here's the full log file from a slightly shorter run with the same model that illustrates the same issue. If I don't run with CrossoverBasis=1, crossover+fixing up the solution with simplex takes hours. If I don't run with DegenMove=0, it also spends hours trying degenerate simplex moves.
> gurobi_cl CrossoverBasis=1 DegenMoves=0 Symmetry=0 Cuts=0 Heuristics=0 hang.mps
Using license file /home/pat2/gurobi.lic
Set parameter CrossoverBasis to value 1
Set parameter DegenMoves to value 0
Set parameter Heuristics to value 0
Set parameter Symmetry to value 0
Set parameter Cuts to value 0
Set parameter TokenServer to value gurobi.lic.rice.edu
Set parameter LogFile to value gurobi.log
Gurobi Optimizer version 9.1.2 build v9.1.2rc0 (linux64)
Copyright (c) 2021, Gurobi Optimization, LLC
Read MPS format model from file hang.mps
Reading time = 11.02 seconds
(null): 648960 rows, 9440257 columns, 28327680 nonzeros
Thread count: 48 physical cores, 96 logical processors, using up to 32 threads
Optimize a model with 648960 rows, 9440257 columns and 28327680 nonzeros
Model fingerprint: 0x4d9e4235
Model has 638976 SOS constraints
Variable types: 9440257 continuous, 0 integer (0 binary)
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [1e+00, 1e+00]
Bounds range [0e+00, 0e+00]
RHS range [8e01, 1e+02]
Presolve removed 768 rows and 0 columns (presolve time = 5s) ...
Presolve removed 354310 rows and 0 columns (presolve time = 11s) ...
Presolve removed 361772 rows and 7214 columns (presolve time = 15s) ...
Presolve removed 361772 rows and 6016064 columns (presolve time = 33s) ...
Presolve removed 361772 rows and 6016064 columns (presolve time = 48s) ...
Presolve removed 361772 rows and 6016064 columns (presolve time = 55s) ...
Presolve removed 361772 rows and 6016064 columns (presolve time = 55s) ...
Presolve removed 595607 rows and 6249899 columns (presolve time = 60s) ...
Presolve removed 595607 rows and 6249899 columns (presolve time = 68s) ...
Presolve removed 595607 rows and 6249899 columns (presolve time = 72s) ...
Presolve removed 595607 rows and 6249899 columns (presolve time = 87s) ...
Presolve removed 595607 rows and 6436544 columns (presolve time = 319s) ...
Presolve removed 782252 rows and 6436544 columns (presolve time = 320s) ...
Presolve removed 794695 rows and 6436544 columns (presolve time = 326s) ...
Presolve removed 794695 rows and 6436544 columns (presolve time = 330s) ...
Presolve removed 794695 rows and 6436544 columns (presolve time = 336s) ...
Presolve removed 794695 rows and 6436544 columns (presolve time = 340s) ...
Presolve removed 794695 rows and 6436544 columns (presolve time = 356s) ...
Presolve added 2818900 rows and 0 columns
Presolve removed 0 rows and 3056784 columns
Presolve time: 355.68s
Presolved: 3467860 rows, 6383473 columns, 19380729 nonzeros
Variable types: 3225105 continuous, 3158368 integer (3158368 binary)
Deterministic concurrent LP optimizer: primal simplex, dual simplex, and barrier
Showing barrier log only...
Root barrier log...
Ordering time: 4.01s
Barrier statistics:
Dense cols : 1
AA' NZ : 1.686e+07
Factor NZ : 2.160e+08 (roughly 3.4 GBytes of memory)
Factor Ops : 2.438e+11 (roughly 1 second per iteration)
Threads : 30
Objective Residual
Iter Primal Dual Primal Dual Compl Time
0 3.07832202e+02 3.57723871e+06 1.83e+04 3.78e16 1.98e+00 383s
1 3.02336043e+02 1.02163359e+05 5.24e12 2.66e14 1.45e02 385s
2 1.00000745e+02 3.51195214e+01 5.11e13 2.31e14 1.91e05 387s
3 1.00013461e+02 9.98648727e+01 2.38e13 1.29e13 2.10e08 389s
4 1.00000013e+02 9.99998649e+01 2.77e13 3.13e18 2.10e11 391s
5 1.00000000e+02 9.99999999e+01 2.40e12 3.01e18 2.10e14 393s
Barrier solved model in 5 iterations and 393.12 seconds
Optimal objective 1.00000000e+02
Root crossover log...
250119 DPushes remaining with DInf 0.0000000e+00 468s
0 DPushes remaining with DInf 0.0000000e+00 469s
2198 PPushes remaining with PInf 0.0000000e+00 469s
0 PPushes remaining with PInf 0.0000000e+00 469s
Push phase complete: Pinf 0.0000000e+00, Dinf 0.0000000e+00 469s
Root simplex log...
Iteration Objective Primal Inf. Dual Inf. Time
56187 1.0000000e+02 0.000000e+00 0.000000e+00 470s
56187 1.0000000e+02 0.000000e+00 0.000000e+00 472s
Concurrent spin time: 45.14s (can be avoided by choosing Method=3)
Solved with barrier
Root relaxation: objective 1.000000e+02, 56187 iterations, 158.17 seconds
Nodes  Current Node  Objective Bounds  Work
Expl Unexpl  Obj Depth IntInf  Incumbent BestBd Gap  It/Node Time
0 0 100.00000 0 3559  100.00000   525s
0 0 100.00000 0 3559  100.00000   530s
0 2 100.00000 0 3559  100.00000   536s
^C
Interrupt request received
Explored 1 nodes (56187 simplex iterations) in 6709.62 seconds
Thread count was 32 (of 96 available processors)
Solution count 0
Solve interrupted
Best objective , best bound 1.000000000000e+02, gap 

Something actually seems very wrong. If I turn heuristics to 1, it hangs in the state shown in the second log for over 48 hours. It seems it has picked a variable to branch on but hasn't finished either child node.
0 
Hi Philip,
Thank you for your request, I just created a ticket out of it. We'll handle it from there.
Best regards,
Elisabeth
0
Please sign in to leave a comment.
Comments
2 comments