Optimization time: local machine vs computer cluster
AnsweredHello,
I'm using Gurobi to solve a MIQP, both in my local machine and in a computing cluster.
My local machine has a Intel Core i7-3632QM @ 2.20GHz processor, and when running my program in it, Gurobi outputs the following information:
Gurobi Optimizer version 9.1.1 build v9.1.1rc0 (win64)
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
which is in accordance with my local machine's processor.
The computing cluster I am using has 32 Intel (R) Xeon(R) Gold 5218 CPU @ 2.30GHz processors. I am using SLURM as a resource manager, and I choose the number of CPUs to use through the command "--cpus-per-task". However, no matter how many CPUs I use to run my program, when running my program in the cluster, Gurobi always outputs:
Gurobi Optimizer version 9.1.2 build v9.1.2rc0 (linux64)
Thread count: 32 physical cores, 32 logical processors, using up to 32 threads
As expected, when running my program in the cluster, the optimization time of my program decreases the more CPUs I allow it to use. However, what is a bit puzzling to me is that I need to set --cpus-per-task to at least 4 in order to get the same performance I have in my local machine. For example, if I set --cpus-per-task=1, the optimization program takes much longer when running in the cluster.
My questions are: is there a reason why I need 4 CPUs in the cluster to achieve the same performance I get using one CPU in my local machine? Am I misunderstanding how the parallel optimization features of Gurobi are expected to work?
-
Official comment
This post is more than three years old. Some information may not be up to date. For current information, please check the Gurobi Documentation or Knowledge Base. If you need more help, please create a new post in the community forum. Or why not try our AI Gurobot?. -
Hi Pedro,
You have to tell both Slurm and Gurobi how many threads you want to use. For Slurm it is important to know how many threads a job will use to ensure a correct resource scheduling for all jobs in the queue. But this does not change any physical information on the server machine itself. Gurobi still sees 32 cores and will use them. This can potentially conflict with other jobs on that machine and therefore its performance is not reliable anymore. You can tell Gurobi how many threads you want to use by setting parameter Threads.
Note the difference between CPUs, cores, and threads. Your local machine has 1 CPU, 4 cores, and 8 threads. The server seems to be a 2-socket machine, since the Xeon Gold 5218 has 16 physical cores, and 32 threads. In your case, there might be 2 Xeons installed having in total 32 physical cores. Potentially, there might be 64 threads but Hyperthreading might be disabled in your system (which makes sense in HPC clusters). Additionally, Gurobi uses by default "only" up to 32 threads.
Keeping this in mind, you could repeat your performance benchmark with the Threads parameter set. This might lead to more intuitive results.
Best regards,
Mario0
Post is closed for comments.
Comments
2 comments