Significantly different results of scaled and unscaled models
AnsweredDear community,
we are solving a large energy system optimization with gurobi and started to experiment with model scaling. Interestingly, the objective function values (after unscaling) in the optimal solution differs by 4% (edited) between the scaled and unscaled model. Apparently the problem seems to be occur in the root relaxation, as the objectives of the root relaxation is already very different (see logs below).
We now try to understand the issue, and the problem seems to lie in the presolve step. So far we have tried setting the numeric focus parameter to 2, the differences remains. If we turn off presolve, the root relaxation objective between the scaled and unscaled model is near to equal, but for obvious reasons we run into memory issues early on in the branch and bound.
We'd be happy to get some ideas of how to deal with this issue as we expect that scaling our models will have performance benefits for similarly structured models.
The log for the unscaled model:
Set parameter TimeLimit to value 36000
Set parameter MIPGap to value 0.001
Set parameter NodefileStart to value 0.5
Set parameter LPWarmStart to value 0
Set parameter FeasibilityTol to value 1e-05
Gurobi Optimizer version 10.0.1 build v10.0.1rc0 (win64)
CPU model: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz, instruction set [SSE2|AVX|AVX2]
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 832242 rows, 797200 columns and 1823597 nonzeros
Model fingerprint: 0x8afa2ebb
Variable types: 744634 continuous, 52566 integer (35044 binary)
Coefficient statistics:
Matrix range [7e-06, 1e+06]
Objective range [1e+00, 1e+00]
Bounds range [1e-03, 9e+06]
RHS range [1e-03, 1e+03]
Presolve removed 744671 rows and 730982 columns
Presolve time: 2.81s
Presolved: 87571 rows, 66218 columns, 227471 nonzeros
Variable types: 45749 continuous, 20469 integer (9740 binary)
Deterministic concurrent LP optimizer: primal and dual simplex
Showing first log only...
Root simplex log...
Iteration Objective Primal Inf. Dual Inf. Time
26768 1.1589095e+11 0.000000e+00 6.753002e+09 5s
36988 9.9720831e+08 0.000000e+00 1.407614e+07 10s
44648 8.4841647e+08 0.000000e+00 1.028779e+07 15s
47358 8.4727003e+08 0.000000e+00 4.295131e+06 20s
49408 8.4726485e+08 0.000000e+00 3.016320e+06 25s
51708 8.4725972e+08 0.000000e+00 2.338121e+06 30s
Concurrent spin time: 0.01s
Solved with dual simplex
Root simplex log...
Iteration Objective Primal Inf. Dual Inf. Time
22779 8.4716378e+08 0.000000e+00 0.000000e+00 31s
Root relaxation: objective 8.471638e+08, 22779 iterations, 26.44 seconds (62.47 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 8.4716e+08 0 8107 - 8.4716e+08 - - 30s
H 0 0 1.253376e+09 8.4716e+08 32.4% - 31s
H 0 0 8.484429e+08 8.4716e+08 0.15% - 32s
H 0 0 8.484422e+08 8.4716e+08 0.15% - 32s
0 0 8.4724e+08 0 9548 8.4844e+08 8.4724e+08 0.14% - 35s
H 0 0 8.472497e+08 8.4724e+08 0.00% - 37s
Cutting planes:
Gomory: 1
MIR: 1716
Explored 1 nodes (36868 simplex iterations) in 37.77 seconds (80.07 work units)
Thread count was 8 (of 8 available processors)
Solution count 4: 8.4725e+08 8.48442e+08 8.48443e+08 1.25338e+09
Optimal solution found (tolerance 1.00e-03)
Best objective 8.472497407541e+08, best bound 8.472437484617e+08, gap 0.0007%
And for the scaled model (edited):
Set parameter TimeLimit to value 36000
Set parameter MIPGap to value 0.02
Set parameter NodefileStart to value 0.5
Set parameter LPWarmStart to value 0
Gurobi Optimizer version 10.0.3 build v10.0.3rc0 (win64)
CPU model: Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 832242 rows, 797200 columns and 1823587 nonzeros
Model fingerprint: 0xd174a4c6
Variable types: 744634 continuous, 52566 integer (35044 binary)
Coefficient statistics:
Matrix range [1e-03, 1e+03]
Objective range [1e+02, 1e+02]
Bounds range [2e-04, 9e+04]
RHS range [1e-02, 1e+01]
Presolve removed 751390 rows and 734290 columns
Presolve time: 3.04s
Presolved: 80852 rows, 62910 columns, 207604 nonzeros
Variable types: 45744 continuous, 17166 integer (8873 binary)
Deterministic concurrent LP optimizer: primal and dual simplex
Showing first log only...
Root simplex log...
Iteration Objective Primal Inf. Dual Inf. Time
401 1.5460759e+11 3.189357e+03 5.656206e+09 5s
6879 1.4589546e+09 0.000000e+00 6.491352e+08 10s
12999 1.2766930e+09 0.000000e+00 4.751216e+08 15s
26923 9.7903579e+08 0.000000e+00 4.901709e+08 20s
37847 8.4716380e+08 0.000000e+00 0.000000e+00 23s
Concurrent spin time: 0.31s
Solved with primal simplex
37847 8.4716380e+08 0.000000e+00 0.000000e+00 23s
Root relaxation: objective 8.471638e+08, 37847 iterations, 18.85 seconds (44.11 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 8.4716e+08 0 8287 - 8.4716e+08 - - 23s
H 0 0 1.274275e+09 8.4716e+08 33.5% - 23s
H 0 0 8.815568e+08 8.4716e+08 3.90% - 31s
0 0 8.6163e+08 0 9402 8.8156e+08 8.6163e+08 2.26% - 37s
0 0 8.6327e+08 0 9386 8.8156e+08 8.6327e+08 2.07% - 43s
0 0 8.6330e+08 0 9234 8.8156e+08 8.6330e+08 2.07% - 45s
0 0 8.6330e+08 0 9236 8.8156e+08 8.6330e+08 2.07% - 45s
0 0 8.6338e+08 0 9262 8.8156e+08 8.6338e+08 2.06% - 50s
0 0 8.6340e+08 0 9238 8.8156e+08 8.6340e+08 2.06% - 53s
0 0 8.6341e+08 0 9239 8.8156e+08 8.6341e+08 2.06% - 53s
0 0 8.6343e+08 0 9248 8.8156e+08 8.6343e+08 2.06% - 57s
0 0 8.6346e+08 0 9255 8.8156e+08 8.6346e+08 2.05% - 59s
0 0 8.6346e+08 0 9258 8.8156e+08 8.6346e+08 2.05% - 59s
0 0 8.6346e+08 0 9252 8.8156e+08 8.6346e+08 2.05% - 61s
0 0 8.6347e+08 0 9254 8.8156e+08 8.6347e+08 2.05% - 63s
0 0 8.6347e+08 0 9254 8.8156e+08 8.6347e+08 2.05% - 63s
0 0 8.6348e+08 0 9252 8.8156e+08 8.6348e+08 2.05% - 65s
0 0 8.6348e+08 0 9252 8.8156e+08 8.6348e+08 2.05% - 67s
H 0 0 8.815541e+08 8.6348e+08 2.05% - 70s
H 0 2 8.815526e+08 8.6348e+08 2.05% - 71s
0 2 8.6348e+08 0 9247 8.8155e+08 8.6348e+08 2.05% - 71s
3 4 8.6349e+08 2 2545 8.8155e+08 8.6348e+08 2.05% 2083 77s
7 4 8.6351e+08 3 2541 8.8155e+08 8.6349e+08 2.05% 2924 81s
H 28 4 8.815502e+08 8.6357e+08 2.04% 3085 84s
31 4 8.6358e+08 9 2530 8.8155e+08 8.6358e+08 2.04% 3207 85s
H 56 4 8.815478e+08 8.6366e+08 2.03% 3305 88s
H 57 4 8.815371e+08 8.6368e+08 2.03% 3251 88s
68 3 cutoff 19 8.8154e+08 8.6371e+08 2.02% 3382 96s
74 4 cutoff 19 8.8154e+08 8.6374e+08 2.02% 3479 103s
79 6 cutoff 21 8.8154e+08 8.6376e+08 2.02% 3444 107s
H 88 5 8.815224e+08 8.6378e+08 2.01% 3407 108s
99 7 cutoff 26 8.8152e+08 8.6384e+08 2.01% 3381 110s
108 5 8.6390e+08 30 2493 8.8152e+08 8.6388e+08 2.00% 3295 116s
Cutting planes:
Gomory: 2
Implied bound: 2092
MIR: 8896
Flow cover: 803
RLT: 6
Relax-and-lift: 85
Explored 118 nodes (458744 simplex iterations) in 116.92 seconds (178.79 work units)
Thread count was 8 (of 8 available processors)
Solution count 8: 8.81522e+08 8.81537e+08 8.81548e+08 ... 1.27428e+09
Optimal solution found (tolerance 2.00e-02)
Best objective 8.815224335724e+08, best bound 8.638989956311e+08, gap 1.9992%
-
Hi Jan,
Your model may have some numerical issues due to a huge matrix coefficient range
Coefficient statistics: Matrix range [7e-06, 1e+06] Objective range [1e+00, 1e+00] Bounds range [1e-03, 9e+06] RHS range [1e-03, 1e+03]
This can affect both the running time and the solution quality and could also be the reason for inconsistent results. Ideally, we recommend a matrix range of no more than 6 orders of magnitude (yours has 12!).
Your scaling did not improve the statistics.We recommend reformulating or scaling your model to decrease the matrix range, see for example Advanced user scaling. Do you already know our Guidelines for Numerical Issues? This might be helpful.
Did you compare the solutions of both models?
What happens if you write the solution of the unscaled model, scale it, and then use it as a MIP start for the scaled model? Does this scaled solution violate constraints in the scaled model? By how much? Is there already a violation in the unscaled model?I also noticed that you relaxed the feasibility tolerance. What is the reason for this? I would not recommend relaxing the tolerances, especially if you have coefficients below the tolerances.
Cheers,
Marika0 -
Hi Marika,
sorry, took me a bit of time to answer. Thanks for your help so far! We do know the advanced user scaling guide and the guidelines for numerical issues.
We tried a couple of things to get to the bottom of it:
(1) As suggested, we set the feasibility tolerance to the default value. However, the differences remained.
(2) We turned off presolve in the unscaled model, this reduced the difference in the solutions basically to zero. I assume that the problem is that with large matrix ranges gurobi removes columns or rows that are very similar (but not the same). Could you verify this?
(3) We scaled the model to get a matrix range of 6 orders of magnitude. We then solved the unscaled model and used the solution as a MIP start for the scaled model - it produces a constraint violation (see log below, edited)
Optimize a model with 832242 rows, 797200 columns and 1823587 nonzeros
Model fingerprint: 0x0e1c4d78
Variable types: 744634 continuous, 52566 integer (35044 binary)
Coefficient statistics:
Matrix range [1e-03, 1e+03]
Objective range [1e+02, 1e+02]
Bounds range [2e-04, 9e+04]
RHS range [1e-02, 1e+01]
User MIP start did not produce a new incumbent solution
User MIP start violates constraint x1410435 by 55.538733925
Presolve removed 751390 rows and 734290 columns
Presolve time: 2.98s
Presolved: 80852 rows, 62910 columns, 207604 nonzeros
Variable types: 45744 continuous, 17166 integer (8873 binary)This raises two new questions for me:
- I assume, we can trust the solution of the scaled model 'more' than the solution of the unscaled model as it has less constraint violation. Is that correct?
- While the RR in the scaled model is easier to solve, the B&B takes significantly longer, I assume this is because gurobi is 'stricter' with constraint violations. Do you have any thoughts on this? Is it worth looking at reducing the matrix range even more? We have quite some binaries in the model, and I guess the matrix range should be in the range of 1 then? You can find the log of the scaled problem below (edited).
Gurobi 10.0.3 (win64) logging started Mon Oct 23 14:08:04 2023
Set parameter LogFile to value "userData\Scaling\20231023140205_scaled_withoutwarmstart\log.txt"
Set parameter TimeLimit to value 36000
Set parameter MIPGap to value 0.02
Set parameter NodefileStart to value 0.5
Set parameter LPWarmStart to value 0
Gurobi Optimizer version 10.0.3 build v10.0.3rc0 (win64)
CPU model: Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 832242 rows, 797200 columns and 1823587 nonzeros
Model fingerprint: 0xd174a4c6
Variable types: 744634 continuous, 52566 integer (35044 binary)
Coefficient statistics:
Matrix range [1e-03, 1e+03]
Objective range [1e+02, 1e+02]
Bounds range [2e-04, 9e+04]
RHS range [1e-02, 1e+01]
Presolve removed 751390 rows and 734290 columns
Presolve time: 3.04s
Presolved: 80852 rows, 62910 columns, 207604 nonzeros
Variable types: 45744 continuous, 17166 integer (8873 binary)
Deterministic concurrent LP optimizer: primal and dual simplex
Showing first log only...
Root simplex log...
Iteration Objective Primal Inf. Dual Inf. Time
401 1.5460759e+11 3.189357e+03 5.656206e+09 5s
6879 1.4589546e+09 0.000000e+00 6.491352e+08 10s
12999 1.2766930e+09 0.000000e+00 4.751216e+08 15s
26923 9.7903579e+08 0.000000e+00 4.901709e+08 20s
37847 8.4716380e+08 0.000000e+00 0.000000e+00 23s
Concurrent spin time: 0.31s
Solved with primal simplex
37847 8.4716380e+08 0.000000e+00 0.000000e+00 23s
Root relaxation: objective 8.471638e+08, 37847 iterations, 18.85 seconds (44.11 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 8.4716e+08 0 8287 - 8.4716e+08 - - 23s
H 0 0 1.274275e+09 8.4716e+08 33.5% - 23s
H 0 0 8.815568e+08 8.4716e+08 3.90% - 31s
0 0 8.6163e+08 0 9402 8.8156e+08 8.6163e+08 2.26% - 37s
0 0 8.6327e+08 0 9386 8.8156e+08 8.6327e+08 2.07% - 43s
0 0 8.6330e+08 0 9234 8.8156e+08 8.6330e+08 2.07% - 45s
0 0 8.6330e+08 0 9236 8.8156e+08 8.6330e+08 2.07% - 45s
0 0 8.6338e+08 0 9262 8.8156e+08 8.6338e+08 2.06% - 50s
0 0 8.6340e+08 0 9238 8.8156e+08 8.6340e+08 2.06% - 53s
0 0 8.6341e+08 0 9239 8.8156e+08 8.6341e+08 2.06% - 53s
0 0 8.6343e+08 0 9248 8.8156e+08 8.6343e+08 2.06% - 57s
0 0 8.6346e+08 0 9255 8.8156e+08 8.6346e+08 2.05% - 59s
0 0 8.6346e+08 0 9258 8.8156e+08 8.6346e+08 2.05% - 59s
0 0 8.6346e+08 0 9252 8.8156e+08 8.6346e+08 2.05% - 61s
0 0 8.6347e+08 0 9254 8.8156e+08 8.6347e+08 2.05% - 63s
0 0 8.6347e+08 0 9254 8.8156e+08 8.6347e+08 2.05% - 63s
0 0 8.6348e+08 0 9252 8.8156e+08 8.6348e+08 2.05% - 65s
0 0 8.6348e+08 0 9252 8.8156e+08 8.6348e+08 2.05% - 67s
H 0 0 8.815541e+08 8.6348e+08 2.05% - 70s
H 0 2 8.815526e+08 8.6348e+08 2.05% - 71s
0 2 8.6348e+08 0 9247 8.8155e+08 8.6348e+08 2.05% - 71s
3 4 8.6349e+08 2 2545 8.8155e+08 8.6348e+08 2.05% 2083 77s
7 4 8.6351e+08 3 2541 8.8155e+08 8.6349e+08 2.05% 2924 81s
H 28 4 8.815502e+08 8.6357e+08 2.04% 3085 84s
31 4 8.6358e+08 9 2530 8.8155e+08 8.6358e+08 2.04% 3207 85s
H 56 4 8.815478e+08 8.6366e+08 2.03% 3305 88s
H 57 4 8.815371e+08 8.6368e+08 2.03% 3251 88s
68 3 cutoff 19 8.8154e+08 8.6371e+08 2.02% 3382 96s
74 4 cutoff 19 8.8154e+08 8.6374e+08 2.02% 3479 103s
79 6 cutoff 21 8.8154e+08 8.6376e+08 2.02% 3444 107s
H 88 5 8.815224e+08 8.6378e+08 2.01% 3407 108s
99 7 cutoff 26 8.8152e+08 8.6384e+08 2.01% 3381 110s
108 5 8.6390e+08 30 2493 8.8152e+08 8.6388e+08 2.00% 3295 116s
Cutting planes:
Gomory: 2
Implied bound: 2092
MIR: 8896
Flow cover: 803
RLT: 6
Relax-and-lift: 85
Explored 118 nodes (458744 simplex iterations) in 116.92 seconds (178.79 work units)
Thread count was 8 (of 8 available processors)
Solution count 8: 8.81522e+08 8.81537e+08 8.81548e+08 ... 1.27428e+09
Optimal solution found (tolerance 2.00e-02)
Best objective 8.815224335724e+08, best bound 8.638989956311e+08, gap 1.9992%0 -
Hi Jan,
I am a bit surprised about your results.
I want to make sure that I fully understand what is happening, so let me ask some questions.In the very first log, you showed the run of the unscaled model. You said that you get a different solution and objective value if you solve it without presolve. And if I understand correctly the objective value without presolve is 14% worse. Is this correct? In the log I do not see any warnings, so it seems that the solution of the unscaled model with presolve does not contain any violations above the tolerances.
Could you please confirm that?
You can check the maximum violations, e.g., in python with Model.printQuality() or by checking the attributes ConstrVio, MaxVio, or BoundVio.
Are there any warnings if you solve the unscaled model without presolve? How is the solution quality in this case?When scaling constraints or variables, an implicit tightening of the feasibility tolerance can happen. And also the other way round!
However, a violation of nearly 162 in the scaled model from a (scaled) solution of the unscaled model is very high.
Did you check the corresponding constraint x1410435 in the unscaled model? How much is the violation in the unscaled model? How did you scale and did make sure that the solution was adapted to the scaled version?A side note: You can also use our Gurobi Modelanalyzer to check your model for numerical issues. The angle_explain() method helps to identify near-parallel rows or columns.
0 -
Hi Marika,
thanks for you answer. We noticed we made a mistake in the specifications of the models ending up essentially running two different models. The 14% difference was thus incorrect, however, we still have a 4% difference that remains between the scaled and the unscaled model (I took the freedom to adapt the previous posts respectively).
We tested now three different model types: (1) Unscaled, with default options, (2) Scaled and (3) Unscaled without presolve. Find the results below. For both, Model 1 and 3 (the unscaled ones), we get warnings that the constraint violations exceed the tolerance.
Also thanks a lot for suggesting the Modelanalyzer - we will give it a shot!
Model 1 Model 2 Model 3 Matrix range [7e-06, 1e+06] [1e-03, 1e+03] [7e-06, 1e+06] Objective Range [1e+00, 1e+00] [1e+02, 1e+02] [1e+00, 1e+00] Bounds Range [2e-02, 9e+06] [2e-04, 9e+04] [2e-02, 9e+06] RHS Range [1e+00, 1e+03] [1e-02, 1e+01] [1e+00, 1e+03] Scaled 0 1 0 Presolve 1 1 0 Objective 8.48E+08 8.82E+08 8.62E+08 Gap 0.13% 2.00% 1.67% Best Bound 8.47E+08 8.64E+08 8.47E+08 Bound 0.00E+00 0.00E+00 1.14E-12 Constraint 8.69E-06 1.42E-07 2.57E-06 Integrality 0.00E+00 0.00E+00 0.00E+00 0 -
Hi Jan,
Thanks for the clarification.
The statistics of the scaled model look good. If you also see no warnings, I think it is unnecessary to scale the model further.
It is possible that a solution with a higher constraint violation has a better objective, so this might be an explanation for the difference of 4%.
However, I still wonder about the violation of 55 if you take the solution of the unscaled model as a MIP start for the scaled model. The solution in the unscaled model has a maximum constraint violation of 8.69E-06; as a MIP start in the scaled model, we get a constraint violation of 55. I do not know how exactly you scaled the model, but comparing the statistics I do not see differences that motivate a change about 7 orders of magnitude. What does the constraint x1410435 look like in the scaled and unscaled model?Concerning performance you could test some of the parameters discussed in MIP Models - Gurobi Optimization.
Cheers,
Marika0 -
Hi Marika,
ok, thanks for the assessment!
I think I now understand why there is such a large violation. The affected constraint looks like this:
s_1 * par_unit_capex * var_size' = s_2 * var_capex'
with var_size being an integer variable. In the unscaled model, the optimal solution for var_size is 412, but when we propagate this solution to the scaled model (with var_size being scaled by 10^-2), the warm-start solution is 4.12, which is not an integer. I guess gurobi rounds this to the closest integer, but uses the previous solution for var_capex. With this, we get the constraint violation (see below a table illustrating this):
Unscaled Scaled (with scaled var_size) Scaled (with rounded var_size) s1 1.00E-02 1.00E-02 s2 1.00E-03 1.00E-03 par_unit_capex 4.63E+05 4.63E+05 4.63E+05 var_size (unscaled/scaled) 412 4.12 4.00 var_total_capex 1.91E+08 1.91E+05 1.91E+05 computed violation -6.20E-03 0.00 -55.54
We scale our model with pyomo, but unfortunately pyomo does not give out a warning or error when scaling integer variables. But of course, we run into problems doing so. This also explains the 4% difference in the objective function, as some solutions in the scaled model are excluded due to the scaling of the integer variable.Thanks for the enlighting discussion and your time!
Best,
Jan
0
Please sign in to leave a comment.
Comments
6 comments