When you're evaluating Gurobi, there are several things you could measure as a way to quantify performance. In fact, there is not one correct answer to the question in the title of this article. In this article we provide several suggestions for what and how to measure. Which approach you take, completely depends on your use case.

Measure time to reach given termination criterion

The most important group of measures relates to the time it takes to find your solution. The timespan typically starts when you call model.optimize() and ends when Gurobi completes that function call. Gurobi will report the runtime in the log output and the Runtime model attribute. You will get a very similar result when you store the current date/time just before and after the model.optimize() call and calculate the difference. While this measure sounds simple, the actual time spent depends on your criteria for when ti stop. We summarize common criteria below; you may read more about all parameters related to termination criteria in our documentation.

Time to optimality

The beauty of mathematical optimization is that it provides a globally optimal solution to your problem. As such, in many cases "time to optimality" is the measure that matters. In most cases you would terminate the search procedure when we're close enough to optimality. The main concept for controlling this is the "MIP gap", which is described more extensively here including reasons for allowing a small difference with the truly optimal solution. Set the MIPGap parameter, or leave it at its default value. Gurobi will search until we reach the desired solution quality.

Time to first solutions

In some cases, finding any feasible solution is really difficult and the goal is to just find one such solution. In this case, you may instruct Gurobi to stop once a single solution is found, using the SolutionLimit parameter. Note that if you aim to find one feasible solution as quickly as possible, you may want to look at specific parameters that push Gurobi in this direction like MIPFocus and Heuristics. Note that spending more time on finding a good solution quickly (primary bound) is likely to negatively impact the proof of optimality. In other words, you may find a solution that has good quality quickly, but the MIP gap might suggest that a much better solution could exist.

Time to a given solution quality

In some cases, based on practical experience you have a good understanding of what a good solution looks like. If you are able to express the target quality in terms of the objective function of your model, you may specify the desired value using the BestObjStop parameter.

Custom criteria

In some cases, you want to define more fine-grained termination criteria. When the standard set of parameters offered by the Gurobi API are not sufficient, you may consider using a callback to customize the behaviour. Please find a simple example, as well as a more advanced scenario combining multiple criteria.

Measure quality after a given timespan

In many practical situations, you need an answer to your business problem within a given timespan. In those cases, you might be willing to trade time for solution quality. The typical technical approach for these situations is to define a TimeLimit. When Gurobi manages to find the optimal solution before the limit is reached, it will return immediately. Otherwise, Gurobi will stop once the given limit has been reached and return the best-known solution. For benchmarking, again there are different things to look at, depending on your business requirements.

MIP gap

Again the most commonly used measure is the "MIP gap". Within a limited time, it may not always be possible to find and prove an optimal solution. This will leave you with a non-zero MIP gap value after termination. You can retrieve this number through the MIPGap attribute (note the difference with the parameter, which defines the termination criterion). Note that this situation does not mean you end up with a sub-optimal solution; it could very well be that the best solution found is actually optimal. However, Gurobi has not been able to prove this yet - it is not sure whether any better solution could exist.

Solution quality

Another option is to not focus on proven optimality, but just look at the quality of the solution itself. While this might give less peace of mind than measuring the MIP gap, ultimately it's just the solution itself which will impact your real-world results. Access the solution quality through the ObjVal attribute.

Violations

Finally, for some numerically challenging models, some solver may return solutions that have high quality in terms of the objective function, but violate one or more constraints or bounds. When you have such a problem at hand, you may want to look at those violations and compare between solvers. Note that violations can often be influenced by changing the tolerances the solver uses, so it's important to use equal tolerances in all solvers (note that default values for those tolerances may differ!)

Other factors

Stability/variability

For a first quick comparison, it may be tempting to do a quick run on one model with several solvers and compare them based on a single measurement. However, every mathematical optimization solver is subject to performance variability; some more than others. In a real-world situation it is often essential to get good results consistently - having great results in 5min on average, but sometimes seeing peak times of 60min could very well be a big issue. We therefore recommend performing multiple tests, using multiple model instances and seeds, and looking not only at average results but also outliers. Refer to the article linked above for more details.

Model construction time

The discussion above has focused on the time spent and results achieved while solving a model. However, this is usually just one (important) step between triggering an optimization run and seeing the final results. Before solving starts, a model must be constructed. This practically means calling the solver API to define individual variables, constraints and objectives.

Of course the model construction time depends on the efficiency of the solver itself, e.g. time needed to process an instruction like "add variable". However, there are two things to keep in mind when measuring the total time spent on model construction:

The model construction phase often contains other instructions as well, for example data retrieval (requesting input data from a database) and preprocessing (e.g. filtering, grouping, sorting, cleaning data and transforming the data into a format that can be efficiently accessed for model construction). While those instructions apply equally to all solvers being benchmarked, it is still important to properly understand where time is spent. For example, when model construction takes 25min and solve times range from 3 to 5 minutes, you may not care about the difference between solvers. However, imagine the 25min consists of 15 minutes waiting for a database, 9min for data preprocessing and 1min for constructing the model. Then you might (a) decide to investigate the database performance (b) realize a 3-5min range is actually not that insignificant.
One solver may provide multiple ways of constructing your model. For example, with Gurobi you can not only add variables and constraints one by one, but also multiple at the same time (addVars and addConstrs), as well as through our matrix API. Make sure to choose the most efficient API for your problem and don't hesitate to ask our support for this.

Role of modeling frameworks

Many applications don't interact with a specific solver API directly, but through a modeling framework. These allow you to switch between solvers relatively easily, but come with some disadvantages too. In the context of benchmarking, it's good to be aware of the following aspects.

Model construction times can be inflated by the use of modeling frameworks. Most frameworks maintain their own representation of the mathematical model, which is then handed over to (and duplicated by) Gurobi. This leads to additional time spent and memory used.
Model solve times reported by the framework don't always match the ones reported by the solvers. Often, the handover mentioned above is part of the measured (or perceived) solve time. Similarly, there may be postprocessing.
Modeling frameworks may set specific parameters that differ from Gurobi's default values. While this might help in some cases, in other cases this may negatively impact performance. Make sure to understand the behavior of the framework, for example by studying the Gurobi output.
Some modeling frameworks don't provide the exact same mathematical representation of your model to each solver being tested. In those situations, you might not be comparing apples-to-apples. Consider making an export of the model in a format supported by all solvers and running that through their command-line interfaces.

Further information

Related to