Is there a preference to use Convex-hull reformulation instead of the BigM constraints?
OngoingDear support team,
I am trying to work on a scheduling problem based on its polyhedron reformulations. For that, I would like to reformulate a BigM model into its equivalent Convex hull, (CH), formulation. The transforming map is depicted in the following picture:
As an example, suppose there is a simple scheduling model:
\begin{align*}
\text{minimize} \quad & s_1 + s_2 \\
\text{subject to} \quad & LB \leq s_1 \leq UB \\
\ & LB \leq s_2 \leq UB \\
\ & (s_1 \geq s_2+\alpha) \lor (s_2 \geq s_1+\beta)\\
& s_1, s_2 \geq 0.
\end{align*}
It's linearized as:
\begin{align*}
\text{minimize} \quad & s_1 + s_2 \\
\text{subject to} \quad & LB \leq s_1 \leq UB \\
\ & LB \leq s_2 \leq UB \\
\ & s_1 - s_2 + (M+\beta)y_1 \leq M \\
\ & s_2 - s_1 + (M+\alpha)y_2 \leq M \\
\ & y_1 + y_2 = 1 \\
& s_1, s_2 \geq 0, y_1, y_2 \in \{0,1\}.
\end{align*}
and its CH equivalent would be:
\begin{align*}
\text{minimize} \quad & s_1 + s_2 \\
\text{subject to} \quad & s_1 = w_1 + w_2 \\
\ & s_2 = w_3 + w_4 \\
\ & LB \leq s_1 \leq UB \\
\ & LB \leq s_2 \leq UB \\
\ & y_1 + y_2 = 1\\
\ & w_1 - w_3 + \beta y_1 \leq 0\\
\ & w_4 - w_2 + \alpha y_2 \leq 0\\
\ & w_{i^s} - UBy_{i^d} \leq 0 ,\ \forall i \in I \\
& s_1, s_2 \geq 0, y_{i^d} \in \{0,1\}, w_{i^s} \geq 0.
\end{align*}
Now, my questions are:
- Is it proven that a CH formulation always has a better relaxation than a BigM or time-indexed formulation?
- Is the time-indexed formulation a kind of Convex hull reformulation?
- Can we infer a CH formulation may lead to reducing the solving process in a MIP?
- I have tried to solve a large instance of the above formulation. In that case, it seems the BigM model performs better against the Convex hull formulation. May it go back to the internal mechanism of the modern MIP solvers to deal with BigM formulation rather than whose CH model?
- What actually is the benefit of the CH reformulation if we again need to deal with BigM constraints? Does it really provide a tighter relaxation? If so, why in large instances the BigM formulation seem to solve faster than whose CH?
All the best
-
Hi Abbas,
Please see my answers below:
Is it proven that a CH formulation always has a better relaxation than a BigM or time-indexed formulation?
A convex hull is the tightest linear relaxation for a MIP solution space.
Is the time-indexed formulation a kind of Convex hull reformulation?
No, but it does tend to have a reasonably strong linear relaxation, in that the MIP Gap tends to be small. Unfortunately the time-indexed formulation is one of the first to become intractable as the problem size grows.
Can we infer a CH formulation may lead to reducing the solving process in a MIP?
If you have a convex hull formulation then the model can be solved with an LP algorithm, you won't need MIP strategies like branch and bound. Unfortunately convex hull formulations for many real-world problems are likely to require a number of variables and constraints that is beyond what solvers can currently handle. This is why Dantzig-Wolfe decompositions are typically taught hand-in-hand with column generation.
I have tried to solve a large instance of the above formulation. In that case, it seems the BigM model performs better against the Convex hull formulation. May it go back to the internal mechanism of the modern MIP solvers to deal with BigM formulation rather than whose CH model?
I don't find this surprising. I am guessing your convex hull models are much larger than your BigM models? On a similar topic we often experiment with disaggregating or aggregating constraints, eg for x, y binary
Aggregated constraint:
x_1 + x_2 + ... + x_n <= n y
Disaggregated constraints:
x_i <= y , for all i
The disaggregated constraints provide a tighter linear relaxation but at the cost of more constraints. It is often hard to say which will work better without experimentation.What actually is the benefit of the CH reformulation if we again need to deal with BigM constraints? Does it really provide a tighter relaxation? If so, why in large instances the BigM formulation seem to solve faster than whose CH?
For real world scheduling problems I'd say the benefit of the CH formulation is theoretical, not practical. Note that the idea of cutting planes is to chip away at the linear relaxation, trying to reduce the difference between the current relaxation and the convex hull - but the idea is that these are done where it matters, i.e. the areas of the polyhedron defined by the current relaxation that are optimal. The holy grail of cuts are "facet-defining", i.e. they define facets of the convex hull, however it can be hard to come up with a formulation for facet-defining cuts, it can be difficult to prove they are facet defining, or it may be even be computationally expensive to compute them.
I think this paper would be worth a good read. It presents several formulations for scheduling and gives computational results across different problem sizes:
- Riley
0 -
Dear Riley,
Thank you so much for your detailed answer.
If you have a convex hull formulation then the model can be solved with an LP algorithm, you won't need MIP strategies like branch and bound.
Would you elaborate more on what exactly you mean by "the model can be solved with an LP algorithm"?
I don't find this surprising. I am guessing your convex hull models are much larger than your BigM models?
Yes. The reformulated model is actually much larger than whose BigM. At first, it can be seen in the log file, but some CH formulation properties seem to be well handled by a MIP solver. For example, the number of auxiliary binary variables to manage the linearization of the disjunctions in the BigM is the same as whose CH. Also, some of the added constraints can be removed in the pre-solving phase. However, by assuming the capability of the solver to use these properties, still, the solving process of some instances takes an age to converge the bounds.
The holy grail of cuts are "facet-defining", i.e. they define facets of the convex hull, however it can be hard to come up with a formulation for facet-defining cuts
I totally agree that finding the inclusion-wise maximal faces for a real problem is an art more than being a math formulation, and I am interested in if there is a good reference to explain that. (Indeed, from the practical point of view).
I read the mentioned paper as well as its suggested formulations. Also, the published models are general, and one most likely sees those with some additional modifications in other papers.
As you have had experiences formulating and solving the scheduling models, I was wondering if, you give me some insights and references on how to develop and formulate a sharp model for the scheduling problems, specifically, the parallel ones with the practical limitations?
Best regards
0 -
Hi Abbas,
Would you elaborate more on what exactly you mean by "the model can be solved with an LP algorithm"?
The extreme points of the convex hull of a MILP feasible region are integer-feasible, i.e. if you have a model that defines the convex hull then the solutions corresponding to the extreme points of the polyhedron defined by the constraints and bounds will have integer values for integer variables. This is why solving a MILP which defines a convex hull formulation is equivalent to solving the linear relaxation - the linear relaxation is the convex hull. If you solve the linear relaxation (with an algorithm that produces a basic solution) and some integer variables do not take integer values then you do not have a convex hull formulation. If you're adding extra constraints to the scheduling formulation (as would be typical for real world problems), then you will almost certainly not have a convex hull formulation anymore.
I totally agree that finding the inclusion-wise maximal faces for a real problem is an art more than being a math formulation, and I am interested in if there is a good reference to explain that. (Indeed, from the practical point of view).
I've done this two ways in the past. The first was from scratch, i.e. identify a cut then use a constructive proof to show it is facet defining by finding n affinely independent points which satisfy the cut at equality (where n = dimension of polyhedron). It's not a fun exercise and I wouldn't do it outside of academia. The second, more fun way, was to formulate part of the problem as a network flow (which is known to be a convex hull formulation) then project the model onto a subset of variables, removing most of the network arc variables, and the projection is also a convex hull formulation. The constraints which arise can be unintuitive and perhaps hard to come up with from scratch which is why the method is nice. For an example of this, see Ch6 of my PhD thesis. The technique is also described (albeit very briefly) in
Pochet, Yves, and Laurence A. Wolsey. Production planning by mixed integer programming. Vol. 149, no. 2. New York: Springer, 2006.
However I would try and avoid reinventing the wheel as much as possible and trawl the hundreds (thousands?) of papers which discuss formulations and cutting planes for scheduling problems and try and find ones which closely match your problem. If you're looking at scheduling models extensively then there are probably books which give a thorough exposition too, eg
Blazewicz, Jacek, Klaus Ecker, Erwin Pesch, Günter Schmidt, and J. Weglarz. Handbook on scheduling. Cham: Springer International Publishing, 2019.
- Riley
0 -
Dear Riley,
Many thanks for your detailed answers and explanation. Your last sentences in the first paragraph were really what I was looking for, and cleared many things. I thought by reformulating the disjunctions into its equivalent convex hull formulation, we could make the models with tighter linear relaxation.
But as you correctly mentioned, to decrease the complexity of the solving process, besides this CH reformulation, we need still to consider the rest part of the model to avoid destructing the tighter linear relaxation made and this is really a big challenge. I think I should consider the procedure of how a model with a TUM property is made. Thanks once again for pointing this shiny key out.
Also, the idea you mentioned in the second paragraph, your thesis, is really interesting and I appreciate for bringing this out. I will check that ASAP and get back to you if I have any further questions.
All the best
0 -
Hi Abbas,
I thought by reformulating the disjunctions into its equivalent convex hull formulation, we could make the models with tighter linear relaxation.
I think this is true and definitely worth trying. I have learned that experimentation is king and my intuition is often challenged. For example I have seen a case where the following aggregated constraint:
x_1 + x_2 + ... + x_n <= m y
where m < n was disaggregated:
x_1 + x_2 + ... + x_n <= m
x_i <= y , for all i = 1, ..., nand resulted in better performance of the solver - not surprising at all. But the first constraint is not as strong as the original constraint, and if I replace it with the original constraint then the performance was worse. I still have no explanation for this and it goes against the wisdom that strengthening constraints is a good thing. Sometimes this dark art of modelling is very dark :D
- Riley
0 -
Dear Riley,
Many thanks for sharing your insights. Just as the follow-up questions:
- How does Gurobi deal with disjunction terms internally? Specifically, when one would like to use the indicator variables? (e.g. \( (x \geq a \land y \geq b) \implies (s_1 \geq f_2 \lor s_2 \geq f_1) \), where \(x, y, s_1, s_2, f_1, f_2\) are non-negative variables and \(a, b\) are the predetermined constants)
- Does Gurobi have an internal mechanism to take advantage of the problem structure? Suppose the same problem is defined with and without the math formulation with the block angular structure. Does Gurobi prefer to use this special structure to seed up the solving process? If so, I would like to know how?
Best regards
0 -
Dear support team,
May I have your insight regarding the above questions?
All the best
0 -
Hi Abbas,
Apologies for the slow response, I was on leave last week.
How does Gurobi deal with disjunction terms internally?
I'm not sure what level of detail you are wanting here, but ultimately the answer is branching. Indicator constraints for example will either be translated to linear constraints, and branching can occur on the binary variables introduced, or it will be translated into SOS1 constraints, which are also then handled with branching, but without the introduction of binary variables.
Does Gurobi have an internal mechanism to take advantage of the problem structure?
Yes, presolve reductions, symmetry detection, detecting disconnected components, are all examples of this.
Suppose the same problem is defined with and without the math formulation with the block angular structure. Does Gurobi prefer to use this special structure to seed up the solving process?
I do not know, but if I did the advice would still be to test it and see.
- Riley
0 -
Dear Riley,
Many thanks for your informative comments. The parts, either be translated to linear constraints and will be translated into SOS1 constraints are exactly what I was looking for. Could you please, say the linearization would be applied by the BigM approach or by using something else like a convex hull, etc?
For the last part, I actually made it on the problem with and without a special structure, and the results showed that the Gurobi is performing better with the latter. But I think it may not make sense in the general form and this is why I asked the question.
All the best
0 -
Hi Abbas,
either be translated to linear constraints and will be translated into SOS1 constraints
To be precise, I should have said the indicator constraints are first translated into SOS1 constraints, then during presolve, potentially translated into linear constraints. We can see this with the following code snippet:
import gurobipy as gp
m = gp.Model()
x = m.addVar(vtype="B", name="x")
y = m.addVar(vtype="C", ub=1000)
m.addConstr((x == 0) >> (y == 0))
m.params.Presolve = 0
p = m.presolve()
p.write("temp.lp")Although presolve is turned off any translations to SOS1 constraints still need to occur. The contents of the LP file is then:
Minimize
0 C1
Subject To
R0: x + C2 = 1
Bounds
C1 <= 1000
Binaries
x C2
SOS
s0: S1 :: C2:1 C1:2
EndThe translation of SOS1 constraints into linear constraints can be controlled by the PreSOS1Encoding parameter. You can play around with this by replacing the line:
m.params.Presolve = 0
with
m.params.DualReductions = 0
m.params.PreSOS1Encoding = -1and seeing how the value of PreSOS1Encoding affects the contents of the LP file. The API reference for PreSOS1Encoding is pretty good (it will also link to PreSOS1BigM), but it's also the only information I'm aware of on the translation into linear constraints.
- Riley
0 -
Dear Riley,
Many thanks for your answer as well as the provided links. It seems Gurobi uses the BigM method to linearize finally the indicator constraint. As I am not well familiar with SOS constraints, specifically, dealing related to the disjunction terms, is there any good reference to introduce that?
All the best
0
Please sign in to leave a comment.
Comments
11 comments