Capacity management with Gurobi Compute Server – Gurobi Help Center

Background

Gurobi Compute Server is a software component that helps you solve multiple optimization models concurrently on one or more nodes (machines or containers). These models are submitted by your application(s) running on dedicated resources managed by the user.

Mathematical optimization is computationally intensive, so it’s important to ensure each model has sufficient resources available to allow for good performance. Here, “resources” mainly refers to the number of CPU cores. You can learn more about the relationship between performance and CPU cores in How many cores does my model need?

Without proper capacity management, you would end up running all your models concurrently on a Compute Server. Also, by default each model would attempt to use as many threads as you have cores on your machine. The total number of threads running would exceed the number of cores significantly which negatively impacts performance.

For that reason, we let you control how many models can run concurrently on your Compute Server. There are two approaches to this: (a) based on the number of jobs (b) based on the number of threads.

Job-based capacity

When all models you solve are relatively similar, it’s usually safe to assume you can pick a number of threads per model once and apply that to all your models. When you divide the number of cores on your Compute Server by the ideal number of threads per model, you know the number of models that can run concurrently. The way to configure your environments is as follows:

Use the JobLimit setting on the Compute Server side to control the number of models that will run concurrently.
Use the Threads parameter on your Gurobi model/environment to control the number of threads to be used for each model.

For example, you could have an 8-core Compute Server where you set JobLimit=4 and then submit models with Threads=2.

Note that if you forget to set the Threads parameter, each model would request 8 threads. You end up with 32 threads on your 8-core machine, which should be avoided.
On the other hand, if you forget to set JobLimit and leave it at its default (2 jobs), you will never use more than 4 cores total, which is also not ideal: your machine has unused capacity, and you’re not using your license to its full potential.

Thread-based capacity

When there are significant differences between the models you solve (e.g. because of the dimensions of your input data, or because you share a Compute Server between multiple use cases) then it might not be desired to pick a single value for the Threads parameter. And if you use different values, you will have to calculate your JobLimit based on the maximum value of Threads across your models to avoid having more threads than cores. As you can see, capacity management based on jobs only is not ideal for this scenario.

Fortunately, Gurobi 12 introduced new settings.

Use the Node_ThreadLimit setting on the Compute Server to control the total number of threads you want to allow on your Compute Server. Usually, this value would equal the number of cores you have available.
Use the ThreadLimit parameter on your Gurobi environment object when initializing Gurobi, to define the maximum number of threads you will use for your models within that environment.

For example, on your 8-core machine, you could set Node_ThreadLimit=8. You could then submit several jobs:

Job A with ThreadLimit=4 will start immediately.
Job B with ThreadLimit=8 would have to wait until job A completes.
Job C with ThreadLimit=2 will start while job A is still active, since 4 threads of capacity are still available. Note that Job B is by-passed in the queue.
Job D with ThreadLimit=2 will start while jobs A and C are still active (assuming you increased JobLimit above 2).

Settings, defaults and their interaction

The default values for the settings mentioned in this article are as follows.

Setting	Scope	Default	Meaning
`Threads`	Client	0	Automatic; usually the number of cores
`ThreadLimit`	Client	0	Use `Threads` value
`JobLimit`	Server	2	Max 2 concurrent jobs
`Node_ThreadLimit`	Server	0	Unlimited

High- and Low-Priority queueing

As soon as you have at least two Compute Server nodes, you can get more flexible in your capacity management by using GROUPS. For example, you could submit jobs with CSGROUP priorities, which lets you split the load based on users, projects, applications, or priorities. E.g. you could have a “high-priority” and “low-priority” Compute Server, where all high-priority jobs are pointed at the "high-priority" node first, and "low-priority" node second, by setting CSGROUP=highpriority:0,lowpriority:10. This would allow high-priority jobs to overflow onto the low-priority node (with increased priority) when the high-priority queue is full. By setting CSGROUP=lowpriority:0 for low-priority jobs, they would only be allowed to run on the low-priority node, and only if no other high-priority jobs are in the queue of the low-priority group.

Notes

Jobs will only be accepted when both the job and thread capacity are available. So, if you only want to use thread-based capacity, make sure to set JobLimit to a sufficiently large number.
The number of threads used for optimizing a model is the minimum of Threads and ThreadLimit. So, if you would submit a model with ThreadLimit=2 and Threads=4, you would still only get 2 threads. However, if you do not specify either of these two parameters, the model will request and use a number of threads equal to the number of cores. In other words, only one model can run at any point in time.

Further information

Related to