Distributed MIP logging & ramp-up question
I was just solving a MIP instance on a more powerful remote machine, by starting grb_rs there and configuring my application to use it as single, distributed MIP node. I noticed in the log that the number of open node drops to a quarter at the end of the ramp-up. I wonder why? Also, why even perform ramp-up with a single node?
Best regards, happy holidays & enjoy the weekend
Simon
Gurobi Optimizer version 9.0.0 build v9.0.0rc2 (win64)
Optimize a model with 24564 rows, 79968 columns and 138336 nonzeros
Model fingerprint: 0xe2b0a38d
Model has 10800 general constraints
Variable types: 0 continuous, 79968 integer (79968 binary)
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [1e+00, 3e+00]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 1e+00]
Starting distributed worker jobs...
Started distributed worker on server.somedomain.com:61000
Distributed MIP job count: 1
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | ParUtil Time
Distributing 1 mipstarts among 1 workers
H 0 568.0000000 - - 1s
0 0 568.00000 2.00000 100% 99% 5s
0 0 400.18854 0 2331 568.00000 400.18854 29.5% 99% 12s
...
9485 8172 426.98623 21 2656 512.00000 409.76195 20.0% 99% 3597s
9628 8232 430.56002 22 2786 512.00000 409.76195 20.0% 99% 3634s
9735 8312 428.42618 23 2454 512.00000 409.76195 20.0% 99% 3671s
9851 8409 430.31670 24 2474 512.00000 409.76195 20.0% 99% 3708s
Ramp-up phase complete - continuing with instance 0 (best bd 409.762)
9986 2027 406.99872 0 3266 512.00000 409.76195 20.0% 99% 3771s
10018 2057 410.63395 7 2874 512.00000 410.63885 19.8% 99% 3790s
10050 2089 411.53699 8 2781 512.00000 410.64341 19.8% 99% 3821s
-
Hey Gurobi team,
Any insights into this would be helpful. We continue to have problems with our RampUp for our DistributedMIPs. If there's a way to skip this ramp-up (even for >1 machine), it would be very much appreciated.
Simon, have you noticed anything different if you have upgraded to v9.1 or higher? Our DistributedMIP performance has been truly awful.
0 -
No, unfortunately I haven't been able to do more experiments. I have only used 9.1 without DistributedMIP.
0 -
Hi Simon, hi Ryan,
I apologize for the delay.
It seems that your instance is not very well suited for distributed MIP optimization. I suggest you try tuning your parameters for the "normal" mode instead of running this on a larger cluster of machines.
There is no setting to disable the ramp-up phase.
I hope that helps.
Cheers,
Matthias0 -
Hi Matthias
Hmm... I think you misread my original post? In my example I ran Gurobi on a *single* node, not on a cluster.
Best regards,
Simon0 -
Hi Simon,
A while back, Gurobi support let us know about a hidden parameter, GURO_PAR_RAMPUPNODES.
You can set the number of nodes before it hops out of the ramp-up phase. Anecdotally, it appears that it completes (# workers * GURO_PAR_RAMPUPNODES) in the branch-and-bound tree before hopping out. Maybe you can set that to some low number with 1 worker and it will hop out of the ramp-up sooner and you don't lose as many open nodes? Just a thought - maybe it will help you out.
I do find it interesting that you lose so many nodes after the hop-out with 1 worker. I'm curious if Gurobi can provide a more comprehensive explanation for why this is happening.
Cheers,
Ryan
0 -
Hi!
I really misunderstood the post - so you don't want to use "distributed" MIP since you are running only on one node. Gurobi might as well disallow a DistributedMIPJobs setting of 1. Its main use is to explicitly enable ramp-up and maybe some other techniques that distinguish this from a normal optimization. I don't understand why you would want to use the DistributedMIP mode on a single machine without ramp-up - why don't you just run a normal optimization?
The idea of ramp-up is to avoid idle times on other machines until enough (branch-and-bound) nodes are available to distribute. So all machines initially start a racing phase with different settings or seeds until enough nodes have been generated or some other limit is reached to start the actual distributed part.
Furthermore, I don't understand why it's a bad thing to have fewer open nodes. Usually, you want to have as few open nodes as possible.
Cheers,
Matthias0 -
Thanks for the explanation. This matches my understanding. With my original post I wanted to gain a better understanding of the Distributed MIP algorithms in Gurobi. The parts which I don't quite understand are a) who the single node is racing against, and b) whose open nodes are removed?
It's great to have fewer open nodes, but since there's only a single node racing, who produced these nodes that get removed after ramp up? If they were produced by the single node, aren't they relevant to the search? It's almost as if there were four nodes?!
I run it this way because this works well with our academic license, and makes it very easy to develop on my small machine and rely on a beefy server for the heavy-lifting. Disabling Ramp-Up in this scenario seems sensible.
Cheers & have a nice weekend
Simon0
Please sign in to leave a comment.
Comments
7 comments