Gurobi use only one thread
AnsweredI am running Gurobi 10.03 using the Java API (openjdk 21) to solve a MIP model with 57662 variables and 76414 constraints on a machine with Debian 6.1 and 16 CPUs.
As you can see from the attached screenshot of htop, Gurobi is *always* using only one thread, even if it has entered the inner nodes of the Branch and Cut tree.
I have been looking at htop and I have never seen more than 2 CPUs being used (actually, every time 2 CPUs where above the 10%, the system was switching the computations from one to the other).
When Gurobi is inside the B&C tree, shouldn't it use each available core to solve a different MIP node? From htop I can see that Gurobi has many threads open, but only one is doing heavy computations.
Htop screenshot: https://drive.google.com/file/d/1tUK_l4Y5kRfd3ACfjVYr9shbht2SJp9x/view?usp=share_link
Gurobi log file: https://drive.google.com/file/d/1rJzxTt6A-WwoXOnAck16xq57eIbYXoNl/view?usp=share_link
Any help?
-
Hi Lorenzo,
Could you please share the model you are solving and experiencing the behavior?
Note that uploading files in the Community Forum is not possible but we discuss an alternative in Posting to the Community Forum.
Best regards,
Jaromił0 -
Here is the link to the rlp file: https://drive.google.com/file/d/1XNN23CnuxU2tA7S5DG3nu5uNDIuM_gDT/view?usp=sharing
The file requires authorization to be downloaded, when you first try to download it I will receive an access request and I will grant it to anyone with a @gurobi.com domain (or similar).0 -
Hi Lorenzo,
Is it possible that you are running the computation in a virtual machine?
Given the line
CPU model: Intel Xeon Processor (Skylake), instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 2 physical cores, 16 logical processors, using up to 16 threadsit might be possible that the VM's configuration is wrong. You should probably have 8 physical cores with 2 logical processors each.
Best regards,
Jaromił0 -
Hi Jaromił,
yes, I am running in a virtual machine and your explanation totally makes sense (sorry, I should have noticed it myself).
That is absolutely not the expected configuration of the virtual machine.
I will get in touch with the virtual machine administrator, check the configuration and get back to you once I get more information.
Thank you a lot.0 -
Good to hear that we found the possible cause.
To give more information on Gurobi's thread usage:
Gurobi may restrict itself to what it calls "physical cores" in the log in some parts of the algorithm. This is where we found out that using hyperthreading may hurt performance. A reason for this might be, e.g., accessing memory on a CPU by multiple threads.
Best regards,
Jaromił0 -
Hello Jaromil,
the configuration of the virtual machine has been fixed and now I can see all 16 available logical cores reaching 100% usage during the exploration of the B&B tree.
However, testing the same mathematical model on a much larger instance (323274 rows, 198782 columns and 10951842 nonzeros), Gurobi seems to hang in the root node for a very long time (7000+ seconds) producing no output at all. Is this something that may happen or, maybe, is it the symptom of something that is still wrong in the environment where Gurobi runs? (I know it's a hard question!)
Here is the relevant extract of Gurobi's log:[2023-10-25 06:50:14] [INFO ] Nodes | Current Node | Objective Bounds | Work
[2023-10-25 06:50:14] [INFO ] Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
[2023-10-25 06:50:14] [INFO ]
[2023-10-25 06:50:14] [INFO ] 0 0 1053594.02 0 18522 - 1053594.02 - - 794s
[2023-10-25 06:53:15] [INFO ] 0 0 1052769.80 0 18455 - 1052769.80 - - 975s
[2023-10-25 06:54:53] [INFO ] 0 0 1052729.54 0 18497 - 1052729.54 - - 1073s
[2023-10-25 06:55:11] [INFO ] 0 0 1052726.75 0 18463 - 1052726.75 - - 1092s
[2023-10-25 09:03:21] [INFO ] 0 0 332934.987 0 12167 - 332934.987 - - 8781s
[2023-10-25 09:05:22] [INFO ] 0 0 332803.288 0 11945 - 332803.288 - - 8902s
[2023-10-25 09:05:51] [INFO ] 0 0 332798.537 0 11828 - 332798.537 - - 8931s
[2023-10-25 09:20:46] [INFO ] 0 0 318833.491 0 12425 - 318833.491 - - 9826s
[2023-10-25 09:22:35] [INFO ] 0 0 318768.663 0 12560 - 318768.663 - - 9935s
[2023-10-25 09:23:44] [INFO ] 0 0 318735.504 0 12180 - 318735.504 - - 10004s
[2023-10-25 09:23:59] [INFO ] 0 0 318734.419 0 12165 - 318734.419 - - 10019s
Whole log file (max time limit is set to 3 hours): https://drive.google.com/file/d/1B3XKquRP5CW_-QuQ6MO5yHP4R2N_o__y/view?usp=sharing
Model file: https://drive.google.com/file/d/1JRvb6emP3Sqr9_i0vJTEPTcS8HiJr9l0/view?usp=sharing0 -
Hi Lorenzo,
Good to hear that the VM issues have been resolved.
Is this something that may happen or, maybe, is it the symptom of something that is still wrong in the environment where Gurobi runs?
This may happen, especially from big models. What might be happening is that some heuristics (or multiple ones) takes very long time to converge or hit some internal termination criterion.
I will have a look anyway just to confirm that we do not have a bug in place.
Best regards,
Jaromił0 -
If I am not wrong, Gurobi should spend around 5% of the total runtime inside heuristics by default, but I guess that this limit is checked only at the start/end of each heuristics and not during their execution; hence, a particularly long heuristic may take a lot of the total runtime? Is there something that can be done to mitigate this issue?
0 -
Hi Lorenzo,
I had a look at the model. During the root node solution process we generate a model that is very hard to solve via Simplex. The very big output gap during the root node is where Simplex is in the process of solving a relaxation LP. After some experiments, it looks like it might be good to use the following parameters for your model
- Method=2
- Crossover=0
- NodeMethod=2
The above combination of parameters uses Barrier to solve the root node and also makes sure that Barrier is used to solve every node LP. If however, the solution time of node LPs get better after this one "incident", I think it is best to just stick to the default parameters. This would need some experimentation on your side. Independent of the setting, the LP relaxations of your model are hard to solve so it is expected that the progress is rather slow. This may not work well if Barrier runs into "Sub-optimal" solutions. In this case it is best to keep the default settings (or maybe wait for the new release coming at the end of this year).
You could also try experimenting with NoRelHeurTime to try to find a feasible solution before entering the B&B. For example NoRelHeurTime=600 might be a good start.
Maybe a reformulation would also help. In this case I would recommend having a look at our webinar about converting weak to strong MIP formulations.
Best regards,
Jaromił0 -
Thank you a lot for the useful information!
0
Please sign in to leave a comment.
Comments
10 comments