Varying Ordering Time for similar models
AnsweredHey everybody,
I have two models that are super similar, model A and model B. In fact, they only differ in one constraint. The matrix dimensions are similar as well:
model A:
Presolved: 7054320 rows, 4867272 columns, 23158552 nonzeros
model B:
Presolved: 6858329 rows, 4791654 columns, 25840769 nonzeros
However, the subsequent ordering time is significantly different:
model A:
Elapsed ordering time = 5s
Elapsed ordering time = 10s
...
Elapsed ordering time = 65s
Ordering time: 375.31s
model B:
Elapsed ordering time = 5s
Elapsed ordering time = 10s
...
Elapsed ordering time = 2120s
Elapsed ordering time = 2125s
Ordering time: 2221.77s
Now I wonder two things:
- What does the ordering do actually? I tried to find some information about it on the Logging page, but I couldn't find anything.
- What causes the difference in the ordering time? model B is in fact significantly more difficult to solve (solution time: 40000s vs. 10000s), but what characteristics of similarly large optimization models cause different ordering times?
I really appreciate any insights you can provide.
Best
Jacob
EDIT: I re-ran it, and now the ordering times switched (model B < model A), so is the ordering time in fact somewhat arbitrary?
-
Hi Jacob,
Even small changes to a model can change the optimization path significantly.
EDIT: I re-ran it, and now the ordering times switched (model B < model A), so is the ordering time in fact somewhat arbitrary?
Can you reproduce this behavior consistently, i.e., can you reproduce a run such that ordering time is model B < model A and vice versa? Did you change any parameter settings or the machine?
Did you try experimenting with the parameter BarOrder and set it to a fixed value for both models?
Is it possible for you to share the two models? Note that uploading files in the Community Forum is not possible but we discuss an alternative in Posting to the Community Forum.
Best regards,
Jaromił0 -
Hi Jaromił,
once again thanks for your fast and comprehensive help! As always a pleasure to talk to you!
Interestingly, it is mostly but not necessarily always model A < model B, however, different runs of the same model result in different ordering times (e.g., from ~4% of solution time to 26%).
Could you explain a bit (or hint me at an explanation) what the ordering does? If I understand correctly from the BarOrder, it is about filling sparse matrices, correct? I haven't found anything else during my Google search. Can I see or check which BarOrder parameter was used? Is one or the other method faster or slower?
Thanks a lot again for your help!
Best,
Jacob0 -
Hi Jacob,
once again thanks for your fast and comprehensive help! As always a pleasure to talk to you!
Thank you for the kind words.
Could you explain a bit (or hint me at an explanation) what the ordering does? If I understand correctly from the BarOrder, it is about filling sparse matrices, correct? I haven't found anything else during my Google search.
The goal of a barrier ordering algorithm is to permute the ordering of rows in order to reduce fill needed for Cholesky factorization. In other words, a good ordering can reduce the overall time spent in matrix computations. Gurobi uses Approximate Minimum Degree ordering or Nested Dissection ordering. There are many papers on both algorithms (AMD and ND) but even Wikipedia gives a quite acceptable description for both AMD and ND.
Can I see or check which BarOrder parameter was used? Is one or the other method faster or slower?
It is not possible to see which algorithm is used and can be very hard to say a priori which algorithm will be faster for a given model. Usually, Gurobi chooses a fitting algorithm automatically but depending on the model complexity, this choice can be improved. Thus, it definitely makes sense to experiment with the BarOrder parameter if ordering times are surprisingly long.
Interestingly, it is mostly but not necessarily always model A < model B, however, different runs of the same model result in different ordering times (e.g., from ~4% of solution time to 26%).
Do you mean that when you run model A on the same machine without changing any parameters and the ordering time differs? Is something running in the background? Can you consistently reproduce this non-determinism? Is the rest of the optimization path the same?
Best regards,
Jaromił0 -
Dear Jaromił,
Thanks for the explanations, I will look into that.
Turns out, the ordering times for the same model with the same parameters and the same settings vary significantly. I started four identical optimizations for the same model:
Model 1:
Presolve removed 9001776 rows and 7151373 columns (presolve time = 7s) ...
Presolve removed 9066936 rows and 7190373 columns (presolve time = 11s) ...
Presolve removed 12051531 rows and 10174968 columns (presolve time = 15s) ...
Presolve removed 12053031 rows and 10176468 columns (presolve time = 21s) ...
Presolve removed 12053031 rows and 10176468 columns (presolve time = 25s) ...
Presolve removed 12113163 rows and 10247279 columns (presolve time = 30s) ...
Presolve removed 12113949 rows and 10247633 columns
Presolve time: 34.39s
Presolved: 7288417 rows, 5227012 columns, 24079584 nonzeros
Elapsed ordering time = 5s
Elapsed ordering time = 10s
Elapsed ordering time = 15s
Elapsed ordering time = 72s
Elapsed ordering time = 75s
Elapsed ordering time = 80s
Elapsed ordering time = 85s
Elapsed ordering time = 90s
Elapsed ordering time = 95s
Elapsed ordering time = 100s
Ordering time: 502.78s
Barrier statistics:
Dense cols : 1057
AA' NZ : 1.435e+08
Factor NZ : 3.365e+09 (roughly 30.0 GB of memory)
Factor Ops : 2.413e+13 (roughly 30 seconds per iteration)
Threads : 64Model 2:
Presolve removed 8939746 rows and 7130603 columns (presolve time = 6s) ...
Presolve removed 9002806 rows and 7169603 columns (presolve time = 11s) ...
Presolve removed 11988901 rows and 10155698 columns (presolve time = 15s) ...
Presolve removed 11988901 rows and 10155698 columns (presolve time = 20s) ...
Presolve removed 11988901 rows and 10155698 columns (presolve time = 25s) ...
Presolve removed 12049343 rows and 10226800 columns (presolve time = 31s) ...
Presolve removed 12049933 rows and 10226866 columns
Presolve time: 33.06s
Presolved: 7117233 rows, 5130179 columns, 23542730 nonzeros
Elapsed ordering time = 5s
Elapsed ordering time = 10s
Elapsed ordering time = 15s
...
Elapsed ordering time = 2875s
Elapsed ordering time = 2880s
Elapsed ordering time = 2885s
Ordering time: 2891.83s
Barrier statistics:
Dense cols : 194
AA' NZ : 2.691e+09
Factor NZ : 6.589e+09 (roughly 60.0 GB of memory)
Factor Ops : 2.604e+13 (roughly 40 seconds per iteration)
Threads : 64Model 3:
Presolve removed 8960473 rows and 7137068 columns (presolve time = 6s) ...
Presolve removed 9024253 rows and 7176068 columns (presolve time = 11s) ...
Presolve removed 12010348 rows and 10162163 columns (presolve time = 16s) ...
Presolve removed 12010348 rows and 10162163 columns (presolve time = 21s) ...
Presolve removed 12010348 rows and 10162163 columns (presolve time = 25s) ...
Presolve removed 12070483 rows and 10233262 columns (presolve time = 31s) ...
Presolve removed 12071069 rows and 10233328 columns
Presolve time: 33.09s
Presolved: 7176737 rows, 5164037 columns, 23729079 nonzeros
Elapsed ordering time = 5s
Elapsed ordering time = 10s
Elapsed ordering time = 15s
...
Elapsed ordering time = 2415s
Elapsed ordering time = 2420s
Elapsed ordering time = 2425s
Elapsed ordering time = 2430s
Elapsed ordering time = 2435s
Ordering time: 2476.85s
Barrier statistics:
Dense cols : 1591
AA' NZ : 2.048e+09
Factor NZ : 5.339e+09 (roughly 50.0 GB of memory)
Factor Ops : 2.257e+13 (roughly 30 seconds per iteration)
Threads : 64Model 4:
Presolve removed 9012673 rows and 7153926 columns (presolve time = 6s) ...
Presolve removed 9078253 rows and 7192926 columns (presolve time = 11s) ...
Presolve removed 12064348 rows and 10179021 columns (presolve time = 16s) ...
Presolve removed 12064348 rows and 10179021 columns (presolve time = 21s) ...
Presolve removed 12064348 rows and 10179021 columns (presolve time = 26s) ...
Presolve removed 12124820 rows and 10250120 columns (presolve time = 30s) ...
Presolve removed 12125422 rows and 10250186 columns
Presolve time: 32.44s
Presolved: 7323984 rows, 5247979 columns, 24191215 nonzeros
Elapsed ordering time = 5s
Elapsed ordering time = 10s
Elapsed ordering time = 15s
...
Elapsed ordering time = 3170s
Elapsed ordering time = 3175s
Elapsed ordering time = 3180s
Ordering time: 3184.60s
Barrier statistics:
Dense cols : 29
AA' NZ : 2.902e+09
Factor NZ : 6.801e+09 (roughly 60.0 GB of memory)
Factor Ops : 2.587e+13 (roughly 30 seconds per iteration)
Threads : 64I run the models on a High Performance Cluster with Slurm Resource Allocation, but requested the same CPU and Memory. Because of Slurm, I cannot further control the execution of the code.
Do you have an idea what could explain this behavior? I will try the same again, once with BarOrder = 0, and once with BarOrder = 1.
EDIT: Now looking at it, it seems like the number of Dense Cols varies significantly. Can that explain the behavior?
Many thanks and best wishes,
Jacob
0 -
Do you have an idea what could explain this behavior?
Could you also show the first lines of the log files? In particular the finger print values. If the finger print values differ, then something is different.
I run the models on a High Performance Cluster with Slurm Resource Allocation, but requested the same CPU and Memory. Because of Slurm, I cannot further control the execution of the code.
Could you try reproducing the behavior on one particular machine, maybe your own, if it's possible?
0 -
You are right, the fingerprints are different, and now that you mention that, we use a time series aggregation scheme, so the models are not 100% identical. The aggregation works well enough, so that the solution is in a band of +/-2% but the models are not identical.
However, I made an observation regarding the different BarOrder values: When selecting BarOrder = 0, the ordering times are super short (~19s), and longer (440-520s) when selecting BarOrder = 1. However, the model with BarOrder = 1 solves faster afterward, as you have said before. In particular, the ordering times are stable, i.e., similar for all 3 runs.
None of the models reaches an ordering time of ~2500s as seen above. Why does it often take so much longer to order when using the automatic value for BarOrder (= -1) than selecting BarOrder = 0 or BarOrder = 1? Afterward, the runs with BarOrder = 1 solve faster than those with BarOrder = -1.
I think this might get a bit academic, I only observed this difference in the ordering, which I did not understand.
Many thanks
Jacob0 -
None of the models reaches an ordering time of ~2500s as seen above. Why does it often take so much longer to order when using the automatic value for BarOrder (= -1) than selecting BarOrder = 0 or BarOrder = 1?
Gurobi has to make a decision which ordering algorithm to choose based on the information it has about the model at the given time. Sometimes, a model is at the edge of a decision.
As a very simplied example, let's say that we will always use BarOrder = 1 if #variables < 1000. A model with 1001 variables would then always be ordered with BarOrder = 0 but there definitely are models with 1001 variables for which BarOrder 1 would be better. Of course, the decision making implemented in Gurobi is more complex but I think you will get the idea.
In your particular case, it looks like your models are at some edge in the decision making process such that some fall into category BarOrder 1 while others fall into category BarOrder 0, such that manual parameter adjustment is necessary.
Note that in general, just using both algorithms in parallel is not a great idea because then one could not utilize all threads for ordering and thus the performance would be slower overall.
If you are interested only in a primal feasible solution, i.e., you run your models without crossover, you might want to also experiment more with the Presolve, PreSparsify, and the Aggregate parameters. Also AggFill can sometimes help but is very specific and can lead to overtuning.
Best regards,
Jaromił1 -
Thanks for your hints and tips!
So if I understand correctly, selecting either 0 or 1 is fast, because Gurobi "knows what to do", whereas selecting the automatic selection takes so much additional time because Gurobi "is not really sure what to select". Is that the layman's version of what is happening?
For the time being I will then go with BarOrder = 1 and play with the other parameters.
Thanks a lot again for your time and insights!
Best
Jacob0 -
So if I understand correctly, selecting either 0 or 1 is fast, because Gurobi "knows what to do", whereas selecting the automatic selection takes so much additional time because Gurobi "is not really sure what to select". Is that the layman's version of what is happening?
Yes and no. When you set the parameter then Gurobi knows what to do and does not have to decide which ordering algorithm to choose. However, it is not the decision time that makes the difference but the final choice itself. You could try running Model2 with BarOrder=-1, 0, and 1 and see that BarOrder=-1 and 0 perform slowly and 1 will perform better as you said. This means that Gurobi automatically decided to use BarOrder=0 but the best decision would be to use BarOrder=1.
Best regards,
Jarek0
Please sign in to leave a comment.
Comments
9 comments