mpi4py and gurobi
AnsweredHi dear community,
I have a similar problem and hope that this is the right place to find an answer :)
I am running an mpi-parallel program (mpi4py) using slurm. I set the 'Threads'-parameter in gurobi to 1 for all optimization tasks that I wish to solve in parallel.
I run the job on a high-performance cluster with multiple nodes. Each node has 20 cpus. I start 10 processes on one node, and since the 'Threads'-parameter is 1 for all gurobi calls, there shouldn't be interference on the cpus in the parallel passages of the program.
Do you think this is a gurobi-relted problem, or rather an mpi- or hardware-related problem? Do you have experience with gurobi in mpi parallel programs?
-
Official comment
This post is more than three years old. Some information may not be up to date. For current information, please check the Gurobi Documentation or Knowledge Base. If you need more help, please create a new post in the community forum, or try Gurobot, our chatbot interface offering instant, expert-level support. -
Hi Anna,
What's the problem you are referring to? Did you intend to paste an image that got lost somehow?
To answer your problem: running 10 single-threaded resource-intensive processes on a machine with 20 threads should be perfectly fine. You might still see a performance degradation compared to running the same processes on 10 separate machines. The operating system has to juggle the different threads and it can happen that there are some bandwidth bottlenecks or the CPU throttles to manage the temperatures or something else might interfere.
Cheers,
Matthias0 -
Hi Matthias,
thank you for your quick reply!
Sorry, I initially posted this issue under another comment and then noticed that the topics actually differ, so I pasted the text into a new thread - don't mind the first sentence, please :)
Ok I am somehow relieved to hear that these performance issues can happen. In my case, the single-process one-threaded gurobi solve takes about 50s, the multi-process one-threaded gurobi solve takes around 1000s, however; would you still say that's normal?
Thank you lots,Best
Anna
0 -
Hi Anna,
No, such a difference is certainly not to be expected and there is something going wrong. You should verify that those jobs are really running in parallel and not sequential and compare the Gurobi logfiles from the single-process and multi-process runs.
By the way: you can edit your posts ;-)
Cheers,
Matthias0 -
Dear Matthias,
thank you lots for your reply! I think I found the parameters that seem to cause the issue now. There are a few new questions that came up now, and I would be very very glad if you could help me another time :)
The job monitoring revealed that the number of flops performed by the four processes was very low. Also, only one process could accomplish it’s work package in the expected time (knowledge about runtimes when sequentially processing the work packages). The other three processes took way too long.
This is what I did:
First, I only set Threads=5 to the solver call and made sure that the number of processes times five equals the numbers of cores on the compute node. Unfortunately, this didn’t solve the problem. Then, I found out about the parameter ConcurrentMIP, because the logging reported something like “solving dual simplex, barrier,… in concurrent mode” (something simillar).
My first question is: is it possible that Gurobi applies ConcurrentMIP although the current problem instance is not a MIP but an LP, and I did not specify ConcurrentMIP?
So I set ConcurrentMIP=1 and suddenly, it worked. My specs now are: ConcurrentMIP=1 and Threads=5. I will try ConcurrentMIP=5 and Threads=5, too.
My main questions are:
Is there any linking in the Gurobi code of the two parameters ConcurrentMIP and Thread; sth like: “ConcurrentMIP must be smaller or equal than Threads, no matter what the user specifies”?
Is there a mechanism that is coordinating the cpu assignment performed by the batch-processing software that I use to start the scripts on the compute nodes with the cpu assignment performed by Gurobi? So, can they, by any chance, overlap?
What is the expected behavior when setting Threads=1 and ConcurrentMIP=20?
Thank you a lot for your help, I really appreciate!
Kind regards
Anna
0 -
Hi Anna!
Please excuse my very late response!
The concurrent optimization will happen automatically when solving LPs. Depending on the number of available cores on your machine, this will run primal and dual simplex on one thread each and use the remaining threads for the barrier method.
The same thing happens for the root node in MIPs. After the root node, the open nodes are distributed among the available cores to speed up the solving process.
ConcurrentMIP will launch several largely independent MIP optimization processes on the same machine on different threads. For each, Gurobi will use as many threads as are still available. For example, on a 16-core machine and ConcurrentMIP set to 8, each MIP job will run on 2 threads. With ConcurrentMIP set to 2, they will each run on 8 threads. So, in general, there is no need to specify the Threads parameter manually as Gurobi will not overload the available cores with too many threads. You can also not set more ConcurrentMIP jobs than the number of available threads/cores on the machine.
The log will tell you how Gurobi uses the available resources:
Concurrent MIP optimizer: 4 concurrent instances (1 thread per instance)
All this can be jeopardized by scheduling multiple Gurobi jobs in parallel on the same machine, which would likely lead to overcrowding the compute node.
I hope that answers your question.
Cheers,
Matthias0
Post is closed for comments.
Comments
6 comments