When solving models with Gurobi on modern multi-core machines, it can be tempting to use all available CPU cores by setting Threads= -1. However, this can lead to higher memory usage and worse performance, especially on large servers or multi-socket systems. While thread count is the most common driver of memory usage, other solver settings and model characteristics can also play an important role.
This article explains why this happens and provides best practices for managing memory and thread counts effectively.
What Is the Threads Parameter?
This parameter controls how many threads Gurobi uses for parallel algorithms. By default (Threads= 0), Gurobi automatically uses all available virtual processors, with a soft cap of 32 threads since higher counts often don’t help and can even hurt performance due to contention or memory limits. While using all cores is usually best, reducing threads can improve performance when sharing a machine, when memory is tight, or when a single uncontested thread explores the search tree more efficiently for certain MIP models. Advanced users can override the cap or use Threads= -1 to use all virtual processors.
More details about the parameter can be found in our Threads parameter documentation.
Why Using All Cores Can Increase Memory Usage
For Mixed-Integer Programs: Gurobi's parallel branch-and-bound algorithm creates one full copy of the model per worker thread. This design enables multiple threads to explore different parts of the search tree independently but has an important implication: memory usage grows with the number of threads.
For example, on a system with 144 logical cores, Threads= -1 causes Gurobi to launch 144 threads and maintain 144 model copies. This can exhaust all available memory and may even slow down the solve due to increased synchronization overhead.
For Continuous Models (LP, QP, SOCP): Memory behavior depends on the chosen algorithm:
- Barrier algorithm: Memory consumption increases as the number of threads grows, but not due to full model copies. Instead, the increase comes from parallelizing matrix operations (reordering, factorization). Memory scaling may be less severe than for MIP since it depends on the barrier factored matrix, which can be smaller or comparable to the original model depending on the formulation.
- Simplex methods: Use less memory than Barrier and do not benefit from additional threads; more threads will not make Simplex faster, so there's no reason to use high thread counts.
More Threads Do Not Always Mean Faster Solves
In practice, performance gains frequently level off at far fewer than 32 threads. For many models, additional threads provide little benefit, and can even increase solve time due to synchronization overhead, while memory usage continues to grow with each added thread.
For most models, we recommend testing moderate thread counts rather than using all available cores. Particularly, we recommend testing with Threads= 1, 2, 4, 8, 16, 32. Many customers find that fewer threads than the default (or fewer than their machine allows) deliver better performance and lower memory consumption. The optimal number of threads is highly model-dependent, and these are simply suggested starting points for experimentation.
NUMA Considerations
On systems with non-uniform memory access (NUMA), which commonly includes modern multi-socket servers, memory access latency depends on where the memory is located relative to the executing core. When Gurobi runs with very high thread counts, threads may span multiple NUMA nodes, increasing memory access latency and synchronization costs. Cache locality is also reduced, which can further impact performance.
As a result, using all available cores on systems with NUMA characteristics can lead to slower solves and higher memory usage than using a smaller number of threads that remain within a single NUMA node. In practice, a major performance bottleneck for the Gurobi solver in these environments is the cost of sharing memory across NUMA nodes.
Graceful Termination and Reducing Memory Usage by Limiting Threads
When solving large models, especially on multi-core machines, it is important to consider not only how to reduce memory usage but also how the solver behaves when memory becomes scarce. By default, if Gurobi exceeds available system memory, the operating system may terminate the process abruptly (for example via an out-of-memory killer). This can result in a hard crash with no opportunity to recover partial results, adjust parameters, or continue the solve in a safer configuration.
To avoid this, Gurobi provides the SoftMemLimit parameter, which enables graceful termination under memory pressure. When a soft memory limit is reached, Gurobi stops the optimization cleanly and returns status MEM_LIMIT. The process remains alive, incumbent solutions and bounds are preserved, and the application can respond programmatically (for example, by reducing the thread count and resuming optimization as shown in the example below). This behavior is not enabled by default, but it is strongly recommended for large or production workloads where robustness and recoverability are important.
A common and effective recovery action after a soft memory limit is reached is to reduce the number of solver threads. Because each worker thread maintains its own copy of the model, lowering Threads immediately reduces memory consumption. Starting with Gurobi Version 11.0 (see timestamp at 31:26), you can even reduce the thread count after a memory-related interruption and resume optimization. This immediately reduces the number of model copies and continues the solve from where it left off.
Example Code
# Start with a moderate number of threads
m.Params.Threads = 8
# Limit the amount of memory to 4 GB
m.Params.SoftMemLimit = 4
m.optimize()
# If a soft memory limit is encountered, retry with fewer threads
if m.status == GRB.MEM_LIMIT:
m.Params.Threads = 1
m.optimize()This approach allows you to balance parallelism and memory usage dynamically.
Gurobi also offers a hard MemLimit parameter, which enforces a strict upper bound on memory usage. When this limit is exceeded, optimization is terminated immediately. While this can be useful in tightly controlled or shared environments, it provides less flexibility than SoftMemLimit and is generally not recommended for adaptive workflows. In most cases, setting SoftMemLimit (often in combination with a moderate initial Threads value) provides the best balance between performance, stability, and the ability to recover gracefully from memory-intensive situations.
Additional Memory Management Options
In addition to controlling the number of threads, there are several other situations and settings that can influence Gurobi’s memory usage.
A quick note on memory and licensing:
RAM is relatively inexpensive on modern machines, and Gurobi does not license or restrict memory usage. Unlike CPU cores, which may be limited by your license, using more RAM is always allowed. If your models are memory-intensive, increasing available RAM is often the simplest and most effective solution.
When Gurobi Uses the Most Memory
There are two common scenarios where memory requirements can increase significantly:
During the solution of the root LP relaxation when multiple algorithms (such as simplex and barrier) are run concurrently on multiple cores
When the branch-and-bound tree becomes very large and a substantial number of nodes must be stored and managed
In the latter case, memory usage typically grows steadily as the tree expands, rather than appearing as a sudden spike.
Using Node Files for Large MIP Trees
For MIP models that generate a large number of nodes, Gurobi can offload part of the branch-and-bound tree to disk using node files. These files are specifically optimized for solver use and are more efficient than relying on the operating system’s virtual memory.
To enable node files, use NodeFileStart (e.g., NodeFileStart= 0.5). This causes nodes in the branch-and-bound tree to be compressed and written to disk once memory usage reaches the specified threshold (measured in GB) of available RAM.
Node files are only helpful when the model generates many nodes that require significant memory. If the model is solved at the root node or shortly thereafter, this setting will not have any effect. NodeFileStart has no effect for continuous models or for MIP models that solve at or near the root node.
You can also specify a custom directory for node files using NodeFileDir. Gurobi automatically manages subdirectories and cleans them up after optimization completes.
Note: This feature should only be used when sufficient memory is not available. Accessing data in RAM is always faster than reading from files.
Important Note on Gurobi Cloud
On Gurobi Cloud machines, available disk space is limited and independent of machine type. As a result, features that rely on disk usage, such as node files, are not recommended. For memory-intensive models, it is better to use machines with more RAM rather than relying on disk-based techniques. Please consult Gurobi Instant Cloud Reference Manual for current Gurobi Cloud machine specifications.
Algorithm and Method Selection
For continuous models (LP, QP, SOCP), algorithm choice can have a significant impact on memory usage:
Simplex methods (
Method=0orMethod=1) generally require less memory than the barrier or concurrent methodsBarrier typically uses more memory, and its memory consumption increases as the number of threads grows
If memory is exhausted during an LP solve, switching to a simplex method (e.g.,
Method=1) and reducing the thread count can often alleviate the issue
Other Practical Suggestions
Use
PreSparsifyto reduce the number of nonzeros in the presolved modelConsider Gurobi Compute Server or Gurobi Cloud to run models on machines with more available memory
Whenever possible, ensure that the model can be solved entirely in physical memory (RAM), as this provides the best performance and stability
- Avoid disabling the presolve routines. If you have set
Presolve=0, try increasing its value to 1 or 2.
Understanding Memory Reports
Different tools may report memory usage differently. Operating system utilities often include shared or cached memory, while Gurobi reports memory used by solver components. As a result, small discrepancies between these measurements are normal.
If you encounter memory-related issues, we strongly recommend first testing lower thread counts before investigating reporting differences in detail. It is also worth noting that Gurobi may encounter memory-intensive operations. If insufficient memory is available to complete such an operation, the solver may stop the optimization and behave as though the soft memory limit had been reached.