Performance of Gurobi on different Machines
AnsweredHi!
I wrote a program for my masters thesis that calls Gurobi several times as a subroutine. This seems to be the bottleneck of my code, since 95% of the time is spent inside of Gurobi.
I first used my personal computer with a 2,7 GHz Dual-Core Intel Core i5, 8 GB 1867 MHz DDR3 processor.
Now I am allowed to use one of the university's working stations which has a 1 x Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz (10 Cores) 12 GB Memory processor.
I expected a huge boost in performance but there seems to be almost none. I tried changing the parameter GRB_INT_PAR_THREADS , GRB_INT_PAR_METHOD, but the running times on my personal machine and the working station still almost don't differ.
Is there any way how I can fix this?
Thank you in advance!
-
Hi Tobias,Have you checkout the article on "Does using more threads make Gurobi faster?"?Increasing the number of threads does not necessarily speed up the optimization algorithm. How many Gurobi jobs are you solving in each subroutine? Are they independent of each other? And how long does each job take to be solved?If you are solving so many independent Gurobi jobs in each subroutine, increasing the number of threads and running those jobs in parallel would likely provide a noticeable speedup (the longer each individual job is, the speedup is likely more). However, if the Gurobi jobs in each subroutine are dependent on each other and/or their optimal solutions are found near the root, increasing the number of threads will not likely help a lot.I suggest looking into the log file to identify the bottleneck in each Gurobi job. For example, if it takes so long for the root relaxation to be solved, changing the Method parameter to 2 (barrier) and having more threads can help. As another example, if it takes so long for the model to find an incumbent solution or the best bound changes very slowly, increasing the number of threads to explore more nodes in parallel can help.
Best regards,
Maliheh0 -
Hi Maliheh,
yes I already checked before asking the question and am still kinda confused.
I only solve one LP per iteration but in total solve thousands that are all independent of each other.
Here is the Log file, can you maybe help me figure out what exactly is the problem?
It seem like it used the Primal Simplex in this example, often it uses the barrier to solve the problem so I already changed the method parameter to 2, but this didn't change anything at all
Thanks for the answer,
Tobias
Gurobi Optimizer version 9.1.1 build v9.1.1rc0 (linux64)
Thread count: 20 physical cores, 20 logical processors, using up to 20 threads
Optimize a model with 47040 rows, 234248 columns and 421456 nonzeros
Model fingerprint: 0x960d6838
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [1e+00, 1e+00]
Bounds range [0e+00, 0e+00]
RHS range [2e+01, 1e+04]
Concurrent LP optimizer: primal simplex, dual simplex, and barrier
Showing barrier log only...
Presolve removed 0 rows and 47040 columns
Presolve time: 0.21s
Presolved: 47040 rows, 187208 columns, 374416 nonzeros
Ordering time: 0.22s
Barrier statistics:
AA' NZ : 9.360e+04
Factor NZ : 1.281e+06 (roughly 100 MBytes of memory)
Factor Ops : 8.199e+07 (less than 1 second per iteration)
Threads : 18
Objective Residual
Iter Primal Dual Primal Dual Compl Time
0 2.05891230e+09 0.00000000e+00 4.28e+03 0.00e+00 1.40e+04 1s
1 7.99199355e+08 9.57896973e+05 3.45e+03 0.00e+00 3.87e+03 1s
2 3.76984709e+08 3.06791526e+06 1.58e+03 0.00e+00 1.81e+03 1s
3 2.17507994e+08 5.56777488e+06 8.85e+02 0.00e+00 1.03e+03 1s
4 1.69117220e+08 7.35356168e+06 6.71e+02 0.00e+00 7.95e+02 1s
5 1.66107116e+08 8.10683166e+06 6.58e+02 0.00e+00 7.80e+02 1s
6 1.42775281e+08 9.11885810e+06 5.56e+02 0.00e+00 6.64e+02 1s
7 1.30527905e+08 1.00436052e+07 5.05e+02 0.00e+00 6.04e+02 1s
8 1.22682719e+08 1.08259037e+07 4.70e+02 0.00e+00 5.64e+02 1s
9 1.17732427e+08 1.21074031e+07 4.48e+02 0.00e+00 5.40e+02 1s
10 1.01230816e+08 1.31266937e+07 3.79e+02 0.00e+00 4.56e+02 1s
11 8.84816115e+07 1.37958396e+07 3.24e+02 0.00e+00 3.90e+02 1s
12 8.08822943e+07 1.41970742e+07 2.91e+02 0.00e+00 3.51e+02 1s
13 7.80735443e+07 1.46924292e+07 2.78e+02 0.00e+00 3.36e+02 1s
14 7.54425567e+07 1.51181733e+07 2.67e+02 0.00e+00 3.22e+02 1s
15 6.81732340e+07 1.54442178e+07 2.34e+02 0.00e+00 2.83e+02 1s
16 6.52567699e+07 1.59344337e+07 2.21e+02 0.00e+00 2.68e+02 1s
17 5.63537328e+07 1.63419649e+07 1.81e+02 0.00e+00 2.20e+02 1s
18 5.45568618e+07 1.66435557e+07 1.73e+02 0.00e+00 2.10e+02 1s
19 5.18277826e+07 1.70695697e+07 1.60e+02 0.00e+00 1.95e+02 1s
20 4.90282837e+07 1.75061425e+07 1.46e+02 0.00e+00 1.79e+02 1s
21 4.67290813e+07 1.78044770e+07 1.35e+02 0.00e+00 1.66e+02 2s
22 4.26935434e+07 1.80588622e+07 1.16e+02 0.00e+00 1.43e+02 2s
23 4.09976121e+07 1.82619013e+07 1.08e+02 0.00e+00 1.34e+02 2s
24 4.04580421e+07 1.83067660e+07 1.06e+02 0.00e+00 1.30e+02 2s
25 3.80796646e+07 1.85633324e+07 9.44e+01 0.00e+00 1.16e+02 2s
26 3.76709901e+07 1.86846947e+07 9.24e+01 0.00e+00 1.14e+02 2s
27 3.51954559e+07 1.89242514e+07 8.03e+01 0.00e+00 9.89e+01 2s
28 3.41656973e+07 1.89954450e+07 7.52e+01 0.00e+00 9.28e+01 2s
29 3.28912756e+07 1.92260278e+07 6.89e+01 0.00e+00 8.49e+01 2s
30 3.20983253e+07 1.94213779e+07 6.45e+01 0.00e+00 7.99e+01 2s
31 3.14211729e+07 1.95116652e+07 6.11e+01 0.00e+00 7.57e+01 2s
32 2.96667304e+07 1.96751105e+07 5.20e+01 0.00e+00 6.45e+01 2s
33 2.90271887e+07 1.98540889e+07 4.86e+01 0.00e+00 6.03e+01 2s
34 2.84261421e+07 2.00042665e+07 4.54e+01 0.00e+00 5.62e+01 2s
35 2.79724035e+07 2.01113819e+07 4.29e+01 0.00e+00 5.31e+01 2s
36 2.74371507e+07 2.01864588e+07 3.97e+01 0.00e+00 4.94e+01 2s
37 2.67412113e+07 2.02413849e+07 3.58e+01 0.00e+00 4.47e+01 2s
38 2.63019678e+07 2.03496742e+07 3.32e+01 0.00e+00 4.15e+01 2s
39 2.59709759e+07 2.04474068e+07 3.14e+01 0.00e+00 3.92e+01 2s
40 2.52349514e+07 2.04955855e+07 2.72e+01 0.00e+00 3.39e+01 2s
41 2.47606481e+07 2.05694251e+07 2.44e+01 0.00e+00 3.05e+01 2s
42 2.45219430e+07 2.05922626e+07 2.30e+01 0.00e+00 2.88e+01 2s
43 2.41791973e+07 2.07380977e+07 2.10e+01 0.00e+00 2.61e+01 2s
44 2.36722657e+07 2.08341283e+07 1.76e+01 0.00e+00 2.21e+01 3s
45 2.35410657e+07 2.09570443e+07 1.67e+01 0.00e+00 2.10e+01 3s
46 2.32812390e+07 2.09790761e+07 1.50e+01 0.00e+00 1.89e+01 3s
47 2.30427296e+07 2.10444627e+07 1.33e+01 0.00e+00 1.69e+01 3s
48 2.28762555e+07 2.11398052e+07 1.21e+01 0.00e+00 1.54e+01 3s
49 2.27474204e+07 2.11969228e+07 1.12e+01 0.00e+00 1.43e+01 3s
50 2.25195896e+07 2.12536300e+07 9.39e+00 0.00e+00 1.21e+01 3s
51 2.23303152e+07 2.13490227e+07 7.97e+00 0.00e+00 1.03e+01 3s
52 2.22251215e+07 2.14193508e+07 7.13e+00 0.00e+00 9.21e+00 3s
53 2.21864602e+07 2.14632161e+07 6.81e+00 0.00e+00 8.79e+00 3s
54 2.20864814e+07 2.15007518e+07 5.92e+00 0.00e+00 7.62e+00 3s
55 2.20660439e+07 2.15185550e+07 5.69e+00 0.00e+00 7.34e+00 3s
56 2.20492259e+07 2.15459904e+07 5.52e+00 0.00e+00 7.15e+00 3s
57 2.19645299e+07 2.16082241e+07 4.66e+00 0.00e+00 6.02e+00 3s
58 2.19334221e+07 2.16250948e+07 4.28e+00 0.00e+00 5.53e+00 3s
59 2.18982616e+07 2.16497986e+07 3.78e+00 0.00e+00 4.88e+00 3s
60 2.18868179e+07 2.16447062e+07 3.60e+00 0.00e+00 4.67e+00 3s
61 2.18673452e+07 2.16733854e+07 3.31e+00 0.00e+00 4.28e+00 3s
62 2.18322595e+07 2.16931197e+07 2.63e+00 0.00e+00 3.40e+00 3s
63 2.18303926e+07 2.17042422e+07 2.59e+00 0.00e+00 3.34e+00 3s
64 2.18224530e+07 2.17201714e+07 2.39e+00 0.00e+00 3.09e+00 3s
65 2.18206818e+07 2.17272214e+07 2.35e+00 0.00e+00 3.04e+00 4s
66 2.18196399e+07 2.17339658e+07 2.32e+00 0.00e+00 3.01e+00 4s
67 2.18188674e+07 2.17397969e+07 2.26e+00 0.00e+00 2.93e+00 4s
68 2.18126091e+07 2.17348009e+07 2.08e+00 0.00e+00 2.71e+00 4s
69 2.18126592e+07 2.17435435e+07 2.04e+00 0.00e+00 2.66e+00 4s
70 2.18059067e+07 2.17568759e+07 1.83e+00 0.00e+00 2.38e+00 4s
71 2.18017488e+07 2.17587278e+07 1.62e+00 0.00e+00 2.11e+00 4s
72 2.18014928e+07 2.17688525e+07 1.61e+00 0.00e+00 2.09e+00 4s
73 2.18013265e+07 2.18000346e+07 1.56e+00 0.00e+00 2.02e+00 4s
74 2.18030801e+07 2.18148719e+07 1.32e+00 0.00e+00 1.71e+00 4s
75 2.18040928e+07 2.17997729e+07 1.27e+00 0.00e+00 1.66e+00 4s
76 2.18054092e+07 2.18168314e+07 1.22e+00 0.00e+00 1.59e+00 4s
77 2.18064954e+07 2.18043784e+07 1.18e+00 0.00e+00 1.55e+00 4s
78 2.18089734e+07 2.18331897e+07 1.12e+00 0.00e+00 1.46e+00 4s
79 2.18107303e+07 2.18475845e+07 1.07e+00 0.00e+00 1.39e+00 4s
80 2.18139975e+07 2.18565554e+07 9.91e-01 0.00e+00 1.29e+00 4s
81 2.18147929e+07 2.18652514e+07 9.80e-01 0.00e+00 1.28e+00 4s
82 2.18182834e+07 2.18741457e+07 9.34e-01 0.00e+00 1.22e+00 4s
83 2.18188508e+07 2.18847518e+07 9.24e-01 0.00e+00 1.21e+00 4s
84 2.18230492e+07 2.18859168e+07 8.56e-01 0.00e+00 1.12e+00 4s
85 2.18269300e+07 2.19091529e+07 7.99e-01 0.00e+00 1.05e+00 4s
86 2.18298682e+07 2.19137177e+07 7.83e-01 0.00e+00 1.04e+00 4s
87 2.18360091e+07 2.19251598e+07 7.26e-01 0.00e+00 9.59e-01 4s
88 2.18375323e+07 2.19197556e+07 7.12e-01 0.00e+00 9.44e-01 5s
89 2.18442774e+07 2.19289311e+07 6.66e-01 0.00e+00 8.91e-01 5s
90 2.18456880e+07 2.19322744e+07 6.59e-01 0.00e+00 8.82e-01 5s
91 2.18486071e+07 2.19389188e+07 6.36e-01 0.00e+00 8.55e-01 5s
92 2.18536361e+07 2.19549627e+07 6.09e-01 7.33e-15 8.23e-01 5s
93 2.18546736e+07 2.19558852e+07 6.02e-01 1.22e-14 8.15e-01 5s
94 2.18575814e+07 2.19613671e+07 5.84e-01 1.69e-14 7.90e-01 5s
95 2.18635622e+07 2.19643155e+07 5.49e-01 1.93e-14 7.42e-01 5s
Barrier performed 95 iterations in 4.86 seconds
Barrier solve interrupted - model solved by another algorithm
Solved with primal simplex
Solved in 78302 iterations and 4.87 seconds
Optimal objective 2.304717600e+070 -
Hi Tobias,
One point before getting into your question:
- It seems that you are using Gurobi 9.1.1, we always recommend using the latest version of Gurobi which is 9.1.2 right now.
The log file shows the default behaviour where primal simplex, dual simplex, and barrier are run simultaneously until one reaches the optimal solution. Typically, the barrier outperforms the other two methods on very large models, however, primal simplex is the winner for your model.
Simplex methods are sequential methods and having more cores do not help. Therefore, it makes more sense to set the parameter Method = 0 to only use primal simplex and parallelize over the independent LPs. Note that you, yourself, need to implement this in your application. In case you are using Python, you can use multiprocessing with Gurobi to run one LP per core. On your machine with 2 cores, you can run 2 LPs simultaneously, however, on your university's machine, you can run 10 LPs simultaneously. In the ideal scenario, this should give a x5 speedup.
Best regards,
Maliheh
0 -
Hi Maliheh,
Thank you for the info, I understand the problem now. Unfortunately I use C and not Python, so I guess I have to give parallelizing a go..
I only have one followup question: depending on the problem size I create, the barrier is the fastest (this is also the case when I average all running times by all methods). If I set the Method = 0, is there any speed up I could get without parallelizing the code?
Best,
Tobias
0 -
Hi Tobias,
Simplex algorithms are sequential methods and are not amenable to parallelization. Therefore, increasing the number of cores does not have any impact.
Since the barrier algorithm outperforms (on average) the simplex methods on your models, you can consider setting Method = 2 to only use barrier. The parallelization helps the barrier algorithm. However, if you are looking for a noticeable speedup, parallelizing over independent LPs is likely a more promising avenue to explore.
Best regards,
Maliheh
0
Please sign in to leave a comment.
Comments
5 comments