figuring out resource requirements

I'm having trouble getting a MIP to solve even the root node. I'm running on a university cluster, and I need to request a specific amount of memory, nodes, and tasks per node when I submit my job. Currently I'm using 1 node, 1 task per node, 96 GB of memory, but I don't even get any feasible solution to the root node within 150 hours (the scheduler killed the job after this time elapsed). I'm not sure how to figure out what my bottleneck is? I.e., would requesting more memory, nodes, or tasks per node help? The maximum time I can request per university rules is 168 hours.

Here is the Gurobi output:

Changed value of parameter method to 4

Prev: -1 Min: -1 Max: 5 Default: -1

Changed value of parameter mipgap to 0.1

Prev: 0.0001 Min: 0.0 Max: 1e+100 Default: 0.0001

Changed value of parameter timelimit to 9.99999999999e+11

Prev: 1e+100 Min: 0.0 Max: 1e+100 Default: 1e+100

Optimize a model with 1746714 rows, 7510471 columns and 25685499 nonzeros

Variable types: 5494157 continuous, 2016314 integer (0 binary)

Coefficient statistics:

Matrix range [1e+00, 1e+00]

Objective range [1e-02, 1e+00]

Bounds range [0e+00, 0e+00]

RHS range [2e-08, 2e+01]

Presolve removed 1379600 rows and 193993 columns (presolve time = 5s) ...

Presolve removed 1379600 rows and 194963 columns (presolve time = 12s) ...

Presolve removed 1379600 rows and 194963 columns (presolve time = 15s) ...

Presolve removed 1379600 rows and 194963 columns (presolve time = 20s) ...

Presolve removed 1384062 rows and 391291 columns (presolve time = 32s) ...

Presolve removed 1384062 rows and 391291 columns

Presolve time: 31.72s

Presolved: 362652 rows, 7119180 columns, 23175688 nonzeros

Variable types: 5170352 continuous, 1948828 integer (0 binary)

Deterministic concurrent LP optimizer: primal simplex, dual simplex, and barrier

Showing barrier log only...

Presolve removed 4 rows and 0 columns (presolve time = 9s) ...

Presolve removed 4 rows and 0 columns (presolve time = 16s) ...

Presolve removed 4 rows and 0 columns (presolve time = 25s) ...

Presolve removed 62350 rows and 0 columns (presolve time = 42s) ...

Presolve removed 62350 rows and 0 columns (presolve time = 46s) ...

Presolve removed 62350 rows and 0 columns (presolve time = 50s) ...

Presolve removed 62350 rows and 0 columns

Presolved: 300302 rows, 7119180 columns, 21067634 nonzeros

Root barrier log...

Elapsed ordering time = 5s

Elapsed ordering time = 10s

Elapsed ordering time = 15s

Elapsed ordering time = 20s

Elapsed ordering time = 25s

Elapsed ordering time = 30s

Elapsed ordering time = 35s

Elapsed ordering time = 40s

Elapsed ordering time = 45s

Elapsed ordering time = 50s

Elapsed ordering time = 55s

Elapsed ordering time = 60s

Elapsed ordering time = 65s

Elapsed ordering time = 70s

Elapsed ordering time = 75s

Elapsed ordering time = 80s

Elapsed ordering time = 85s

Elapsed ordering time = 90s

Elapsed ordering time = 95s

Elapsed ordering time = 100s

Elapsed ordering time = 105s

Elapsed ordering time = 110s

Elapsed ordering time = 115s

Elapsed ordering time = 120s

Elapsed ordering time = 125s

Elapsed ordering time = 130s

Elapsed ordering time = 135s

Elapsed ordering time = 140s

Elapsed ordering time = 145s

Elapsed ordering time = 150s

Elapsed ordering time = 155s

Elapsed ordering time = 160s

Elapsed ordering time = 165s

Elapsed ordering time = 170s

Elapsed ordering time = 175s

Elapsed ordering time = 180s

Elapsed ordering time = 185s

Elapsed ordering time = 190s

Elapsed ordering time = 195s

Elapsed ordering time = 200s

Elapsed ordering time = 205s

Elapsed ordering time = 210s

Elapsed ordering time = 215s

Elapsed ordering time = 220s

Elapsed ordering time = 225s

Elapsed ordering time = 230s

Elapsed ordering time = 235s

Elapsed ordering time = 240s

Elapsed ordering time = 245s

Elapsed ordering time = 250s

Elapsed ordering time = 255s

Elapsed ordering time = 260s

Elapsed ordering time = 265s

Elapsed ordering time = 270s

Elapsed ordering time = 275s

Elapsed ordering time = 280s

Elapsed ordering time = 285s

Elapsed ordering time = 290s

Elapsed ordering time = 295s

Elapsed ordering time = 300s

Elapsed ordering time = 305s

Elapsed ordering time = 310s

Elapsed ordering time = 315s

Elapsed ordering time = 320s

Elapsed ordering time = 325s

Elapsed ordering time = 330s

Elapsed ordering time = 335s

Elapsed ordering time = 340s

Elapsed ordering time = 345s

Elapsed ordering time = 350s

Elapsed ordering time = 355s

Elapsed ordering time = 360s

Elapsed ordering time = 365s

Elapsed ordering time = 370s

Elapsed ordering time = 375s

Elapsed ordering time = 380s

Elapsed ordering time = 385s

Elapsed ordering time = 390s

Elapsed ordering time = 395s

Elapsed ordering time = 400s

Elapsed ordering time = 405s

Elapsed ordering time = 410s

Elapsed ordering time = 415s

Elapsed ordering time = 420s

Elapsed ordering time = 425s

Elapsed ordering time = 430s

Elapsed ordering time = 435s

Elapsed ordering time = 440s

Elapsed ordering time = 445s

Elapsed ordering time = 450s

Elapsed ordering time = 455s

Elapsed ordering time = 460s

Elapsed ordering time = 465s

Elapsed ordering time = 470s

Elapsed ordering time = 475s

Elapsed ordering time = 480s

Elapsed ordering time = 485s

Elapsed ordering time = 490s

Elapsed ordering time = 495s

Elapsed ordering time = 500s

Elapsed ordering time = 505s

Elapsed ordering time = 510s

Elapsed ordering time = 515s

Elapsed ordering time = 520s

Elapsed ordering time = 525s

Elapsed ordering time = 530s

Elapsed ordering time = 535s

Elapsed ordering time = 540s

Elapsed ordering time = 545s

Elapsed ordering time = 550s

Elapsed ordering time = 555s

Elapsed ordering time = 560s

Elapsed ordering time = 565s

Elapsed ordering time = 570s

Elapsed ordering time = 575s

Elapsed ordering time = 580s

Ordering time: 582.59s

Barrier statistics:

AA' NZ : 1.606e+07

Factor NZ : 2.338e+08 (roughly 5.0 GBytes of memory)

Factor Ops : 4.625e+11 (roughly 6 seconds per iteration)

Threads : 21

Objective Residual

Iter Primal Dual Primal Dual Compl Time

0 2.14104074e+06 -4.81895447e+02 5.09e+02 0.00e+00 8.97e-01 801s

1 7.11378286e+05 -2.74727434e+02 1.17e+02 1.64e-01 2.71e-01 879s

2 8.92325467e+04 -8.37988613e+01 1.07e+01 5.21e-02 2.94e-02 961s

3 1.01690978e+04 2.23879878e+02 9.13e-01 1.41e-02 2.97e-03 1052s

4 6.87606154e+03 3.70929185e+02 5.77e-01 1.04e-02 1.93e-03 1135s

5 5.47958402e+03 4.66425951e+02 4.40e-01 9.00e-03 1.50e-03 1209s

6 3.92370482e+03 6.01129221e+02 3.01e-01 6.55e-03 1.01e-03 1290s

7 3.21463794e+03 6.50872480e+02 2.34e-01 6.06e-03 8.03e-04 1363s

8 2.81367772e+03 7.09854823e+02 1.97e-01 5.43e-03 6.74e-04 1443s

9 2.50785585e+03 7.88832411e+02 1.67e-01 4.40e-03 5.66e-04 1535s

10 2.30515291e+03 8.18416820e+02 1.45e-01 4.03e-03 4.94e-04 1619s

11 2.18872973e+03 8.41786747e+02 1.33e-01 3.66e-03 4.52e-04 1692s

12 1.98698996e+03 8.53225317e+02 1.11e-01 3.43e-03 3.80e-04 1765s

13 1.70591951e+03 8.73261434e+02 8.16e-02 3.14e-03 2.83e-04 1844s

14 1.50524627e+03 9.13061585e+02 5.90e-02 2.65e-03 2.06e-04 1934s

15 1.43077793e+03 9.38829753e+02 4.92e-02 2.22e-03 1.73e-04 2017s

16 1.34078220e+03 9.55502013e+02 3.81e-02 1.94e-03 1.36e-04 2090s

17 1.30443141e+03 9.64365372e+02 3.37e-02 1.80e-03 1.21e-04 2163s

18 1.26188097e+03 9.71776399e+02 2.84e-02 1.69e-03 1.03e-04 2243s

19 1.24318173e+03 9.81006477e+02 2.60e-02 1.56e-03 9.41e-05 2315s

20 1.23136682e+03 9.87226936e+02 2.42e-02 1.47e-03 8.80e-05 2397s

21 1.19107842e+03 9.96032252e+02 1.92e-02 1.34e-03 7.07e-05 2487s

22 1.18468316e+03 9.98817684e+02 1.84e-02 1.30e-03 6.77e-05 2567s

23 1.17675659e+03 1.00284307e+03 1.73e-02 1.25e-03 6.40e-05 2641s

24 1.16697507e+03 1.00584840e+03 1.60e-02 1.22e-03 5.94e-05 2715s

25 1.16393862e+03 1.01000412e+03 1.56e-02 1.17e-03 5.78e-05 2794s

26 1.15679740e+03 1.01237785e+03 1.46e-02 1.13e-03 5.42e-05 2881s

27 1.15309755e+03 1.01398380e+03 1.40e-02 1.08e-03 5.21e-05 2962s

28 1.15051453e+03 1.01588494e+03 1.36e-02 1.06e-03 5.08e-05 3036s

29 1.14698752e+03 1.01796793e+03 1.31e-02 1.03e-03 4.89e-05 3109s

30 1.13865080e+03 1.02460250e+03 1.20e-02 9.35e-04 4.44e-05 3189s

31 1.13588473e+03 1.02726898e+03 1.14e-02 8.71e-04 4.23e-05 3264s

32 1.11413570e+03 1.03094468e+03 8.28e-03 7.95e-04 3.15e-05 3346s

33 1.10863243e+03 1.03585426e+03 7.26e-03 7.16e-04 2.78e-05 3439s

34 1.10711859e+03 1.03644003e+03 7.01e-03 7.06e-04 2.69e-05 3517s

35 1.10617944e+03 1.03751652e+03 6.84e-03 6.88e-04 2.63e-05 3592s

36 1.10438535e+03 1.03850885e+03 6.51e-03 6.71e-04 2.51e-05 3667s

37 1.10194933e+03 1.03962376e+03 6.06e-03 6.60e-04 2.36e-05 3749s

38 1.09602995e+03 1.04622090e+03 5.07e-03 5.33e-04 1.95e-05 3838s

39 1.08833816e+03 1.04950192e+03 3.76e-03 4.65e-04 1.48e-05 3920s

40 1.08636339e+03 1.05236409e+03 3.23e-03 4.08e-04 1.29e-05 3994s

41 1.08482185e+03 1.05525674e+03 2.85e-03 3.46e-04 1.13e-05 4069s

42 1.08046365e+03 1.05648756e+03 2.13e-03 3.18e-04 8.77e-06 4150s

43 1.07909499e+03 1.05819151e+03 1.76e-03 2.82e-04 7.41e-06 4223s

44 1.07733449e+03 1.06099040e+03 1.33e-03 2.17e-04 5.71e-06 4305s

45 1.07441917e+03 1.06305913e+03 8.04e-04 1.68e-04 3.67e-06 4395s

46 1.07329827e+03 1.06439729e+03 5.76e-04 1.35e-04 2.74e-06 4477s

47 1.07283534e+03 1.06513192e+03 4.83e-04 1.16e-04 2.34e-06 4551s

48 1.07239447e+03 1.06560822e+03 3.99e-04 1.04e-04 1.99e-06 4625s

49 1.07170791e+03 1.06656638e+03 2.94e-04 7.84e-05 1.49e-06 4708s

50 1.07079037e+03 1.06774017e+03 1.70e-04 4.86e-05 8.72e-07 4798s

51 1.07055198e+03 1.06812932e+03 1.34e-04 3.66e-05 6.89e-07 4878s

52 1.07042692e+03 1.06823270e+03 1.18e-04 3.36e-05 6.17e-07 4951s

53 1.07021427e+03 1.06839200e+03 8.88e-05 2.94e-05 4.90e-07 5026s

54 1.06997158e+03 1.06872209e+03 5.86e-05 1.96e-05 3.30e-07 5108s

55 1.06986978e+03 1.06886642e+03 4.79e-05 1.54e-05 2.67e-07 5182s

56 1.06974904e+03 1.06892977e+03 3.60e-05 1.36e-05 2.10e-07 5262s

57 1.06961974e+03 1.06902069e+03 2.39e-05 1.09e-05 1.48e-07 5352s

58 1.06954436e+03 1.06903387e+03 1.66e-05 1.05e-05 1.17e-07 5432s

59 1.06953670e+03 1.06907585e+03 1.57e-05 9.15e-06 1.07e-07 5509s

60 1.06950703e+03 1.06910945e+03 1.27e-05 8.14e-06 9.06e-08 5586s

61 1.06948231e+03 1.06915488e+03 1.03e-05 6.72e-06 7.42e-08 5675s

62 1.06945894e+03 1.06921007e+03 8.14e-06 4.96e-06 5.71e-08 5781s

63 1.06944723e+03 1.06923197e+03 7.07e-06 4.28e-06 4.95e-08 5867s

64 1.06942867e+03 1.06924407e+03 5.46e-06 3.88e-06 4.10e-08 5946s

65 1.06942097e+03 1.06925561e+03 4.67e-06 3.51e-06 3.62e-08 6022s

66 1.06941312e+03 1.06926961e+03 3.89e-06 3.06e-06 3.10e-08 6097s

67 1.06940701e+03 1.06928080e+03 3.32e-06 2.74e-06 2.70e-08 6183s

68 1.06939554e+03 1.06929785e+03 2.53e-06 2.14e-06 2.08e-08 6281s

69 1.06939075e+03 1.06931139e+03 2.10e-06 1.70e-06 1.70e-08 6367s

70 1.06938553e+03 1.06932304e+03 1.71e-06 1.31e-06 1.35e-08 6443s

71 1.06938049e+03 1.06933206e+03 1.28e-06 1.01e-06 1.03e-08 6521s

72 1.06937552e+03 1.06934113e+03 9.38e-07 7.05e-07 7.42e-09 6608s

73 1.06936820e+03 1.06935027e+03 3.78e-07 3.99e-07 3.60e-09 6706s

74 1.06936675e+03 1.06935511e+03 2.67e-07 2.40e-07 2.38e-09 6792s

75 1.06936378e+03 1.06935871e+03 7.49e-08 1.18e-07 9.38e-10 6870s

76 1.06936276e+03 1.06936177e+03 2.60e-08 1.45e-08 2.09e-10 6947s

77 1.06936233e+03 1.06936202e+03 5.22e-09 6.27e-09 5.95e-11 7024s

78 1.06936228e+03 1.06936216e+03 3.63e-09 1.41e-09 2.32e-11 7110s

79 1.06936224e+03 1.06936218e+03 3.80e-09 7.23e-10 1.09e-11 7202s

80 1.06936222e+03 1.06936218e+03 2.32e-09 6.93e-10 7.81e-12 7288s

81 1.06936221e+03 1.06936220e+03 7.19e-10 1.04e-10 1.09e-12 7365s

Barrier solved model in 81 iterations and 7365.88 seconds

Optimal objective 1.06936221e+03

Root crossover log...

283099 DPushes remaining with DInf 0.0000000e+00 7391s

66660 DPushes remaining with DInf 1.5065884e-05 7423s

12685 DPushes remaining with DInf 0.0000000e+00 7436s

4892 DPushes remaining with DInf 0.0000000e+00 7445s

1543 DPushes remaining with DInf 0.0000000e+00 7452s

180 DPushes remaining with DInf 0.0000000e+00 7458s

7 DPushes remaining with DInf 0.0000000e+00 7462s

0 DPushes remaining with DInf 0.0000000e+00 7466s

9566 PPushes remaining with PInf 5.2399968e-06 7468s

8314 PPushes remaining with PInf 5.2399968e-06 7472s

8096 PPushes remaining with PInf 0.0000000e+00 7475s

5711 PPushes remaining with PInf 0.0000000e+00 7481s

4031 PPushes remaining with PInf 0.0000000e+00 7487s

2317 PPushes remaining with PInf 0.0000000e+00 7493s

464 PPushes remaining with PInf 0.0000000e+00 7502s

0 PPushes remaining with PInf 0.0000000e+00 7506s

Push phase complete: Pinf 0.0000000e+00, Dinf 5.1343183e-12 7508s

Root simplex log...

Iteration Objective Primal Inf. Dual Inf. Time

292423 1.0693622e+03 0.000000e+00 0.000000e+00 7530s

292423 1.0693622e+03 0.000000e+00 0.000000e+00 7569s

Concurrent spin time: 3510.74s (can be avoided by choosing Method=3)

Solved with barrier

Root relaxation: objective 1.069362e+03, 292423 iterations, 11026.43 seconds

Total elapsed time = 11088.89s

Total elapsed time = 19114.32s

Total elapsed time = 19702.05s

Total elapsed time = 21256.24s

Total elapsed time = 22426.80s

Nodes | Current Node | Objective Bounds | Work

Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time

0 0 1069.36220 0 3894 - 1069.36220 - - 23376s

0 0 1069.48074 0 4429 - 1069.48074 - - 69517s

0 0 1069.48074 0 4421 - 1069.48074 - - 70458s

0 0 1069.93352 0 4477 - 1069.93352 - - 183238s

0 0 1070.06944 0 4512 - 1070.06944 - - 208352s

0 0 1070.06966 0 4455 - 1070.06966 - - 210142s

0 0 1070.06967 0 4460 - 1070.06967 - - 210265s

0 0 1070.74120 0 4670 - 1070.74120 - - 322142s

0 0 1070.83038 0 4632 - 1070.83038 - - 336979s

0 0 1070.83141 0 4595 - 1070.83141 - - 341137s

0 0 1070.83143 0 4602 - 1070.83143 - - 341724s

0 0 1071.49069 0 4796 - 1071.49069 - - 521069s

The error message given by the scheduler when the job timed out was the following:

State: TIMEOUT (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 12-11:38:25
CPU Efficiency: 99.88% of 12-12:00:08 core-walltime
Job Wall-clock time: 6-06:00:04
Memory Utilized: 84.66 GB
Memory Efficiency: 90.30% of 93.75 GB

(I'm not sure why it says I'm using 2 cores per node, since my script requests 1?)

Official comment

Simranjit Kaur

Gurobi Staff

June 23, 2025 03:22 Edited

This post is more than three years old. Some information may not be up to date. For current information, please check the Gurobi Documentation or Knowledge Base. If you need more help, please create a new post in the community forum. Or why not try our AI Gurobot?.

Silke Horn

October 17, 2019 12:53

There are a few things I notice from your log:

First of all, the error message from the scheduler says TIMEOUT. So I guess more time would help. :-)

Why does it time out? Is there a hard limit on the duration of such a run on your cluster?

Moreover, it looks as though your model is very difficult.

According to your log, you have 2016314 integer and 0 binary variables, but your matrix range is [1e+00, 1e+00]. Is it possible that your (integer) variables are unbounded? Unbounded integer variables make a model extremely hard to solve (in particular, if there is a huge number of them). If this is the case, you should either add bounds (as tight as possible) or relax on the integrality.

In addition to that, it could be possible to speed up the solver by setting the right parameters. To start, I think you could save some time by choosing Method=2 instead of 4.

Silke

Jennifer Gossels

October 17, 2019 22:37

Hi Silke,

Thanks very much for the response. Yes, the university imposes a time limit of 168 hours. I accidentally set this experiment to 150 hours, so there's a chance those extra 18 hours will make a big difference (I'm running it again with the longer time limit), but I'm not optimistic.

By setting bounds on the variables, you mean that I should tell the solver if I know the final values should be within some range?

You suggest method=2 instead of method=3? As I was writing the message above I noticed that the log said the concurrent spin time can be avoided by choosing method=3...

Thanks again,

Jennifer

October 18, 2019 05:32 Edited

Hi Jennifer,

Yes, with the bounds I meant exactly what you say. Find an upper and lower (if that's not already 0) bound on the integer variables. Otherwise, the bounds will be assumed to be +/- 2 billion (2.000.000.000) and I think in almost all practical applications, this is not reasonable.

On the other hand, if an integer variable does need to have a very big range, you should reconsider whether it really needs to be integer or whether you can remove the integrality condition and round the result in the end. E.g. if such a variable models an amount of money (say in cents) and needs to have a very huge range (because your problem involves millions of bucks), then you can make it continuous and round to the nearest cent (or dollar, or multiple of 100 dollars) without affecting the real-world solution quality. Does that make sense?

Setting good bounds or relaxing on the integrality should make a much bigger difference for the running time than any parameter settings. (As an experiment, you could try to just make all your variables continuous and see whether this helps.)

As for the method, 3 will choose non-deterministic concurrent (i.e., multiple algorithms in parallel), 2 will choose the barrier. Setting it to 3 should provide a speedup, but since we already know from your first run that barrier wins, you could as well set it to use barrier only. (This could provide another albeit probably small speedup since the barrier then does not have to share resources with the other algorithms.)

October 18, 2019 09:03

Thank you so much for the very helpful explanation! Am I correct that requesting more nodes or tasks per node will not help? Do you think more memory will help? The log says Factor NZ takes roughly 5.0 GB of memory, but the error message from the cluster says I used 84.66 GB. I'm not sure if there is an easy way to convert the Factor NZ memory amounts to total amount of memory needed for the whole computation?

October 18, 2019 12:10

I think that more nodes or more tasks might make sense if you wanted to do concurrent optimization, but since the solver gets stuck in the root, I don't expect this to help.

As for the memory, Gurobi does not log (or monitor) the total memory usage. So it is hard to say when these 84.66 GB were used, but I would guess this happened during the root relaxation (because of the barrier). Afterward, memory usage typically goes down significantly and then starts growing again. Can you get onto the machine and see (e.g. using top) how much memory Gurobi is using? If so, you should take note of these numbers and compare them for different times (e.g. towards the end of the root relaxation and afterward).

figuring out resource requirements

Comments

Didn't find what you were looking for?